Consonant Clusters and Structural Complexity

Consonant Clusters and Structural Complexity
Interface Explorations 26
Editors
Artemis Alexiadou
T. Alan Hall
De Gruyter Mouton
Consonant Clusters and
Structural Complexity
edited by
Philip Hoole
Lasse Bombien
Marianne Pouplier
Christine Mooshammer
Barbara Khnert
De Gruyter Mouton
ISBN 978-1-61451-076-5
e-ISBN 978-1-61451-077-2
ISSN 1861-4167
Library of Congress Cataloging-in-Publication Data

A CIP catalog record for this book has been applied for at the Library of Congress.
Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.dnb.de.
2012 Walter de Gruyter GmbH & Co. KG, 10785 Berlin/Boston

Cover image: iStockphoto/Thinkstock
Typesetting: Royal Standard, Hong Kong
Printing: Hubert & Co. GmbH & Co. KG, Gttingen
Printed on acid-free paper
Printed in Germany
www.degruyter.com
Table of contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I. Phonology and Typology

Structural complexity of consonant clusters: A phonologists view . . . . 9
Theo Vennemann
On the relations between [sonorant] and [voice] . . . . . . . . . . . . . . . . . 33
Rina Kreitman
Limited consonant clusters in OV languages. . . . . . . . . . . . . . . . . . . . 71
Hisao Tokizaki and Yasutomo Kuwana
Manner, place and voice interactions in Greek cluster phonotactics . . . . . 93
Marina Tzakosta
Consonant clusters in four Samoyedic languages. . . . . . . . . . . . . . . . . 119
Zsuzsa Varnai
Part II. Production: analysis and models

Articulatory coordination and the syllabication of word initial
consonant clusters in Italian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Anne Hermes, Martine Grice, Doris Mucke and Henrik Niemann
A gestural model of the temporal organization of vowel clusters in
Romanian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Stefania Marin and Louis Goldstein
Coupling of tone and constriction gestures in pitch accents . . . . . . . . . 205
Doris Mucke, Hosung Nam, Anne Hermes and Louis Goldstein
Tonogenesis in Lhasa Tibetan Towards a gestural account . . . . . . . . 231
Fang Hu
Part III. Acquisition

Probabilistic phonotactics in lexical acquisition: The role of syllable
complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Natalie Boll-Avetisyan
vi Table of contents
Acquiring and avoiding phonological complexity in SLI vs. typical

development of French: The case of consonant clusters. . . . . . . . . . . . 285
Sandrine Ferre, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
Part IV. Assimilation and reduction in connected speech

Articulatory reduction and assimilation in n#g sequences in
complex words in German . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Pia Bergmann
Overlap-Driven Consequences of Nasal Place Assimilation . . . . . . . . . 345
Claire Halpert
The acoustics of high-vowel loss in Northern Greek dialects and
typological implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Nina Topintzi and Mary Baltazani
List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Language index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Introduction
This book is a selection of papers from a meeting we organized in Munich in

the summer of 2008. We the editors have our main background in experi-
mental phonetic approaches to speech production. But the attraction for us in
setting up the meeting and subsequently producing this book was precisely the
realization that workers with many different perspectives currently see a better
understanding of complex sound sequences as an important challenge, and
that it could be mutually fruitful to bring these different strands together, e.g.
not just production but also perception, not just phonetics but also phonology
and psycholinguistics. The meeting itself attracted some 50 contributions
(all abstracts can be found at http://www.phonetik.uni-muenchen.de/institut/
veranstaltungen/cluster/), so clearly only a small proportion could be included
as full-length chapters in a book, but nonetheless we have done our best to
do justice to the range of approaches in evidence at the meeting. The book
is divided into four main sections Phonology and Typology, Production:
Analysis and Models, Acquisition, and Assimilation and reduction in con-
nected speech. However, it should be noted as well that there is, of course, a
gradual rather than clear-cut dividing line between the different subelds
working on the organisational principles of sound structure. Therefore, several
contributions using empirical data to reect on theoretical cognitive issues
span across the content of multiple sections.
The section on Phonology and Typology is opened by Theo Vennemanns
chapter entitled Structural complexity of consonant clusters: a phonologists
view, which also, because of its review character, provides a background for
the book as a whole. It introduces the regularities that underlie the composi-
tion of syllables, and in particular of consonant clusters. This is an updated
version with substantial extensions of Vennemanns seminal work Preference
Laws for Syllable Structure and the Explanation of Sound Change. Venne-
mann argues that in many cases sound change is guided by two principles:
syllable structure simplication towards the universally preferred CV syllable
template, and avoidance of violations of Consonantal Strength Sequencing. As
he points out, these and other regularities can describe why sound changes
happen in certain synchronic stages of a particular language or dialect but
it is impossible to predict whether, when and how complexity problems and
violations are resolved. Vennemanns claims are supported by a wealth of
examples from diverse languages.
Rina Kreitmans paper also falls within this longstanding phonological tra-
dition of elucidating the constraints on possible and preferred syllable struc-
tures. The specic focus here is on consonant clusters in the narrow sense of
2 Introduction
consonant sequences with no intervening syllable or morpheme boundary, and

more particularly on the patterns that occur in 2-element clusters with respect
to the feature [sonorant] on the one hand and [voice] on the other hand.
Kreitman considers the patterns for [sonorant] and [voice] rst of all separately,
demonstrating, for example, that of the 15 possible language types in terms
of the possible combinations of [sonorant] and [sonorant] in 2-element
clusters only four actually occur, resulting in a clear set of implicational rela-
tions. However, she is also at pains to point out that patterning with regard to
[sonorant] is not predictable from the patterning with regard to [voice]. Thus
a language may be complex with respect to the [sonorant] combinations it
allows, but simple with respect to [voice], and vice-versa. The results are
based on a typological survey of a large number of languages. The extensive
appendices allow cross-referencing between languages, cluster types, and
sources. This should prove a valuable resource for many questions in addition
to the ones specically addressed in the chapter itself.
The paper by Hisao Tokizaki and Yasutomo Kuwana is the only one in the
volume that explicitly considers the possibility of interactions between syllabic
and syntactic structure. Specically, they examine the idea that languages with
object-verb (OV) word order show a tendency towards simple syllable struc-
ture. While at rst blush The World Atlas of Language Structures (WALS)
fails to conrm this hypothesis, the authors argue that given the relatively
rudimentary system for assessing syllable complexity, a more differentiated
category system might render a different picture, and might indeed reveal a
correlation between word order and syllable complexity. In particular the
authors claim that in order to dene syllable complexity, a broader variety of
factors needs to be taken into account, such as the properties of boundaries in
right branching vs. left branching structures, coda inventories and their geo-
graphical distribution, as well as cluster simplication processes. A preliminary
data survey renders rst support for their hypothesis.
Marina Tzakostas paper addresses the issue of cluster formation within
syllables in Greek as it relates to the notion of sonority and phonotactic con-
straints. Tzakosta proposes an account of Greek phonotactics which expands
the concepts of the sonority scale and sonority distance and takes into con-
sideration three distinct scales: a manner scale, a place scale and a voicing
scale. Moreover Tzakosta argues that, instead of the common dichotomy
between acceptable / unacceptable, consonant clusters fall within three cate-
gories, namely perfect, acceptable and non-acceptable depending on the
satisfaction of the three different scales. All scales are satised in a rightward
manner and cluster perfection and/or acceptability are not absolute but gradient
notions. Tzakostas analysis is based on a variety of CL and CC production data
Introduction 3
taken from a corpus of the major dialectal varieties of Greek, a corpus of

Standard Greek developmental data, as well as a corpus of Greek as a second
language.
Zsuzsa Varnais paper documents rst of all the syllable structures of four
endangered Samoyedic languages spoken in Northern Siberia. Overall, syllable
structure in these languages can be regarded as moderately complex (probably
the most typical case cross-linguistically), meaning that in fact no tautosyllabic
clusters occur in onset position, and only a very restricted range at syllable
boundaries. In the context of the present book this in turn makes these lan-
guages an interesting case-study for understanding how clusters in loan words
are repaired, since all these languages are in extensive contact with Russian,
which is a prime example of a language with complex syllable structure and
a very rich set of clusters. Overall, in common with investigations of other
languages, epenthesis was the most common repair strategy. The clusters most
likely to be targeted for repair were Sibilant Plosive and Plosive Lateral,
but the former were much more likely to be repaired by prosthesis than the
latter. This is discussed in terms of auditory similarity to the Russian forms.
An interesting observation was that the same cluster (even within the same
language) was not necessarily always repaired by the same process.
The papers in the second section Production: analysis and models form a
more homogeneous group than might appear at rst sight, since they all frame
their questions with close reference to the computational model of syllable
structure currently being developed by Goldstein, Nam and colleagues. Much
of the original work in this approach was devoted to understanding articula-
tory coordination as a function of position within the syllable (see e.g. work
by Byrd and colleagues, and recently by Marin & Pouplier); with particular
reference to clusters this led to concepts such as the c-center as an expression
of the typical gestural coupling relations in syllable onsets. A further common
feature of the papers in this second section is that they can all be seen as
engaged in exploring whether and how this approach can now be applied to
a wider range of structural phenomena. Three of the four papers use electro-
magnetic articulography as the primary source of data; the remaining paper
(Marin/Goldstein) uses the in a sense complementary technique of articu-
latory synthesis as the primary tool.
Anne Hermes and colleagues thus approach the phenomenon of Italian
impure s within this explicit gestural framework. By varying the cardinality
of syllable onset clusters they nd evidence for the c-center effect by examin-
ing the shift of the right-most consonant into the vowel. They explain this shift
as the result of temporal reorganisation due to underlying timing constraints
(competitive oscillator coupling) as they are hypothesized for syllable onsets.
4 Introduction
Crucially however, if the rst element of a cluster is a sibilant (impure s) it

does not take part in the temporal reorganisation. Within the applied frame-
work the authors argue that Italian in general allows for complex syllable
onsets. Impure s, however, is not part of the onset or maybe not even part
of the syllable at all. Apart from contributing to the long-standing debate on
the syllable afliation of impure s, this study also demonstrates how morpho-
logical structure can be reected in articulatory dynamics.
The chapter by Stefania Marin and Louis Goldstein further widens the
books perspective on the structural properties of the syllable since the focus
here is on vowel clusters rather than consonant clusters. The approach followed,
namely the dynamic gestural approach proposed in articulatory phonology and
its quantitative implementation within a computational task-dynamic model,
has, as just mentioned, already been extensively used for consonant clusters.
This paper represents one of its rst applications to the coordination patterns
found in sequences of vowels. Specically, it shows for Romanian vowels
how concepts from dynamical models such as coupling modes and gestural
blending weight can be used to capture in a principled way the differences
between tautosyllabic diphthongs and heterosyllablic (hiatus) vowel sequences
on the one hand, and on the other hand the alternation between a diphthongal
realization under stress and a monopththongal one in the absence of stress.
Interestingly, both the diphthongal and monophthongal realization on the
surface can be modelled as being the result, underlyingly, of two simultaneous
vowel gestures.
The last two papers in this section contribute to the widening of perspective
by moving beyond purely segmental aspects of articulatory coordination to
consider the integration of tonal phenomena. Thus Doris Mcke and collea-
gues contribute to current efforts to extend the model to the coordination of
oral gestures with so-called tonal gestures, i.e. they revisit the much-discussed
topic of tonal alignment from an articulatory perspective. Data from Catalan
and Viennese German are analyzed, leading to the suggestion that different
coupling relations seems to underlie the coordination of oral gestures with
high vs. low tones. Through comparison of lexical and prosodic pitch accent
tones they nd that the timing of consonants and vowels within the syllable is
not perturbed by the prosodic tones in contrast to lexical tones.
The nal paper extends the range of tonal phenomena considered to the
topic of tonogenesis. This might appear at rst sight somewhat remote from
the main concerns of this book. But in fact Fang Hus paper forms part of a
lively ongoing discussion relating tonogenetic mechanisms in many languages
to both structural complexity and to consonant clusters. This link is appropriate
rst in a general sense because there is evidence that emergence of tonal systems
Introduction 5
(increase in laryngeal complexity) often parallels a decrease in segmental com-

plexity (in particular, simplication of consonant clusters). But there may also
be a much more specic link. Here Hu extends to Tibetan previous work
by Gao on Mandarin. Using ideas on gestural coupling developed in the
articulatory phonology/task dynamics framework evidence is found that in
tonal syllables the initial consonant and the tone exhibit the coordination
pattern with each other (and in turn with the vowel) that has previously been
observed for complex onsets involving consonant clusters (the pattern referred
to as the c-center effect). In terms of the gestural coupling, the tone acts rather
like a consonant in a complex onset.
For the third section the focus moves to Acquisition, one of the classic
topics of psycholinguistic research, and both papers in this section provide
examples of typical experimental paradigms from this eld. Underlyingly at
issue is the nature of the mental representations that need to be assumed to
predict performance in speech and language development.
The theme of lexical acquisition, and the usefulness of using stimuli of
increased structural complexity, is addressed in the paper by Natalie Boll-
Avetisyan. She examines the role of probabilistic phonotactics in facilitating
short-term memory (STM) recall and lexical acquistion. While studies over
the past decade have shown that probabilistic phonotactics inuences lexical
acquistion, it is still controversial whether probabilistic phonotactic knowledge
is informed by abstractions from lexical entries, or also by sub-lexical repre-
sentations. As Boll-Avetisyan points out, previous studies exclusively used
structurally simple test items, in particular CVC syllables, which are subject
to weaker co-occurrence constraints as compared to consonant clusters. She
presents the results of two STM recognition experiments with adult native
speakers of Dutch, a language allowing for highly complex syllable onsets
and codas. Using a reaction-time paradigm, recognition was found to be faster
for non-words of high than of low phonotactic probability. However, the effect
was only present when complex syllables were used, but not when syllables
were simple. Thus, the effect of probabilistic phonotactics increases with increas-
ing syllable complexity. Boll-Avetisyan argues that sub-lexically represented
probabilistic phonotactics are informed by abstract knowledge about phonologi-
cal structure and suggests that sub-lexical knowledge might bootstrap lexical
acquisition.
In the only paper in the volume that is explicitly concerned with disordered
populations Sandrine Ferr and co-authors approach the notion of phonological
complexity by comparing the productions of French-speaking adolescents
suffering from Specic Language Impairment (SLI) with the productions of
6 Introduction
typically-developing children, addressing the question of whether the phono-

logical notion of syllabic complexity can make valid predictions about the
degree of difculty in the online processing of consonant clusters. Adopting
Angoujards (1997) syllable model, the so-called Rhythmic Grid model, pho-
nological complexity is dened on the basis of the constraints that govern the
association between segments (consonants) and syllabic position. Ferr et al.s
results of a series of repetition experiments show that while consonants are
produced correctly when they appear in isolation, production errors increase
in certain syllabic positions and with increasing length of the consonant
cluster, in particular for speakers with SLI. Adolescents with SLI and typically-
developing children also demonstrated very different repair strategies. Ferr et
al. therefore conclude that the syllable-internal complexity can affect repetition
accuracy independently of the number of segments and that phonological com-
plexity lies above all in the constraints that govern the association between
consonants and syllable, and not merely in the size of a consonant cluster.
In the nal section Assimilation and reduction in connected speech the
common characteristic of all three papers is that they are concerned with conso-
nant sequences that are not lexically given, but which emerge when morphemes
or words are concatenated to form longer utterances (Bergmann, Halpert), or
when variable processes of reduction lead to the loss of intervening vocalic
material (Topintzi and Baltazani). A key issue in assimilation, and in reduction
processes in general, is the role of gestural overlap. Discussion of this role
also forms a common thread to these papers, and links them to the gesturally
oriented papers of the second section.
In their acoustic study, Nina Topintzi and Mary Baltazani deal with High
Vowel Deletion/reduction in Kosani Greek. The process is shown to be quite
variable and gradient, more frequent for [u] than for [i]. To some extent, it
correlates with aspiration and duration in adjacent consonants. This contribu-
tion adds to the cross-linguistic typology of Vowel Deletion, with perhaps the
most striking observations being those relating to cases of vowel deletion
between voiced consonants. As a consequence of vowel deletion, a large range
of derived consonant clusters emerges which outnumbers the inventory of
clusters typically found in Standard Greek by far. An approach based on
gestural overlap is argued to be overall the most successful route towards
accounting for the observations.
The paper by Claire Halpert contributes directly to the discussion of how
gestural overlap may condition assimilation based on data from Zulu. She
provides an analysis of nasal place assimilation in Zulu noun prexation in
terms of gestural overlap, arguing that changes in laryngeal features accom-
panying nasal assimilation can be accounted for under the assumption that
Introduction 7
gestural duration is constrained in assimilation processes. An optimality-

theoretic constraint is posited which demands that an assimilated NC sequence
must not exceed the duration of a singleton C. The enforcement of this con-
straint may, however, result in a marked structure depending on the laryngeal
features of the consonants involved. This by hypothesis explains why implo-
sives and aspirated stops lose their laryngeal properties in NC assimilation. As
evidence for the existence of an active durational constraint on assimilation,
the paper presents pilot acoustic data showing that assimilated NC clusters
are durationally comparable to singletons while non-assimilating mC clusters
are twice as long as singletons.
Finally, the paper by Pia Bergmann extends a long tradition of articulatory
study (here using electropalatography) of assimilatory behaviour of consonant
sequences across word or morpheme boundaries. Earlier studies had made
clear that the classical case of coronal consonants assimilating with respect to
place of articulation to the following consonant can involve a whole continuum
of articulatory reduction rather than being an all-or-nothing change of the place
of articulation of the target consonant. The present paper adds to the evidence
that the degree of reduction can be affected by frequency effects (perhaps inter-
acting with word-class effects, e.g. more reduction when a function word is
involved) and by prosodic strength. Moreover, the paper also identies an
effect about which much less is known, namely the inuence of the vowel pre-
ceding the consonant sequence (more reduction following long than short
vowels); this is discussed with respect to models of German syllable structure.
Finally, the paper uses the EPG data to pay detailed attention to speaker-
specic assimilatory strategies.
In closing, it remains for us to thank the German Research Council (DFG)
for supporting our own work on clusters (grants to Philip Hoole, Christine
Mooshammer and Marianne Pouplier) and in particular to acknowledge the
support of the DFGs Priority Programme 1234 Phonological and phonetic
competence: Between grammar, signal processing, and neural activity and of
its coordinator Hubert Truckenbrodt for the organization of the Munich meet-
ing. We thank the Interface Explorations series editors Artemis Alexiadou und
Tracy Hall for the opportunity to publish this book in their series. Last but not
least we are extremely grateful to the large number of conscientious reviewers
we were able to count on as the book took shape.
Part I. Phonology and Typology
Structural complexity of consonant clusters:
A phonologists view
Theo Vennemann
Abstract
This paper attempts a denition of consonant clusters, consonant cluster complexity,
and cluster complexity reduction in a phonological perspective. In particular, since at
the present stage of our knowledge a metrical (and thus: general) denition of con-
sonant cluster complexity is not possible, a relative and structure-dependent concept is
proposed: Only clusters within the scope of one and the same preference law can be
compared, namely evaluated as the more complex the less preferred they are in terms
of that preference law. This concept, as well as ways in which cluster complexity is
reduced, are illustrated with examples from various languages. They include word-
initial muta-cum-liquida reductions in Spanish and Portuguese, certain cases of
metathesis at a distance (e.g. Spanish periglo > peligro danger), and slope dis-
placements as in Old Italian ca.'pes.tro > ca.'pres.to rope, Tuscan pa.'dro.ne >
pra.'do.ne lord, employer. The opposite kind of development, namely the formation
and complexication of clusters, is argued for the most part not to be motivated by
syllable structure preferences but (a) by a variety of syntactic and morphological pro-
cesses and (b) in phonology itself by rhythmically induced copations (e.g. syncope in
Latin periculo > Spanish periglo), or to result from borrowing.
1. What is a consonant cluster?
Let us begin with the question what we mean when speaking about consonant
clusters. What would be a suitable denition? Since I am a phonologist rather
than a phonetician, all the denitions that follow will be phonological rather
than phonetic.
The Oxford English Dictionary denes a cluster as a collection of things of
the same kind, as fruits or owers, growing closely together; a bunch, origi-
nally of grapes [!]. The word is attested in the language as early as the year
800. It is assumed to be a -tro-derivate of the same root that we also have in
clot, clout, and cleat, German Klotz and Klo.
In any event, a cluster consists of discrete elements, a consonant cluster of
discrete consonantal elements. In traditional phonetics one learns that phonetic
objects are continua. Hence a consonant cluster as a phonetic object would
have to be a continuum, and that is what a cluster by denition is not. Philip
12 Theo Vennemann
Hoole (p.c.) has assured me that modern phonetics can show that a degree of
segmentation already occurs at the articulatory level, rather than only on the
mental articulatory retina (for which cf. Tillmann/Mansell 1980), and that
within the so-called gestural framework (Browman and Goldstein 1986,
1989, 1992), gestures whose coordination is part of a words lexical represen-
tation bear a close relationship to those conglomerates of gestures that con-
stitute what is traditionally considered to be a segment (Byrd 1996: 160).
However that may be, phonologists are dealing exclusively with discrete
objects. Therefore in that regard they have no problem dening a consonant
cluster, namely indeed as a set of consonants understood as discrete objects,
but more precisely as an uninterrupted sequence of two or more consonants
within some well-dened unit of language, such as a syllable, word, or phrase.
And if phonologists do have a problem it is because they do not know for sure
what a consonant is, an uncertainty which may also hold for phoneticians. For
example, is the second speech sound in twist, twinkle, twine, twenty, twaddle,
etc. and in quick, quest, quiet, quota, etc. a consonant or a vowel? If it is
a consonant, then the words twist and quick begin with a consonant cluster.
If the second speechsound is just the vowel /u/ in a syllable margin, namely
in a complex syllable head, then those words do not begin with a consonant
cluster, but rather with a sequence of consonant and vowel within a syllable
head. Perhaps that is actually what phonologists mean when speaking of
consonant clusters: an uninterrupted sequence of marginal speech sounds, i.e.
a sequence of speech sounds not interrupted by a syllable nucleus (nor, of
course, by a pause). And this may be the only legitimate meaning if we take
seriously the idea that the speech sounds of any language can be arranged
hierarchically on scales of increasing consonantality, or decreasing sonority,
without any break-off point, as in (1).
(1) Consonantal Strength scale: No division between vowels and consonants

increasing Consonantal Strength
voiceless plosives
voiced plosives
voiceless fricatives
voiced fricatives
nasals
lateral liquids (l sounds)
central liquids (r sounds)
high vowels
mid vowels
low vowels
Structural complexity of consonant clusters: A phonologists view 13
This particular scale is that presented in Vennemann (1988: 9). There are
other arrangements. Some authors use ner scales, for example scales which
hierarchize obstruents and nasals by place of articulation, and vowels on the
frontness parameter. Conversely there are less ne-graded scales, such as
scales lumping all obstruents or all vowels together or not distinguishing
lateral and central liquids in terms of strength. Thus one often sees the simple
scale V L N F P (vowels liquids nasals fricatives plosives). For some languages
even this scale may prohibit certain generalizations. The only scale to which
I have never seen contrary language material is V R O (vowels sonorants
obstruents). The above scale may be the most ne-graded that most linguists
can agree on. When ner distinctions are made, language-specic differences
begin to play a role, and linguists will begin to differ.
The scalar nature of the consonantality, or conversely the sonority, of the
speech sounds in any language is a venerable concept, much worked with by
Sievers (1901), among others. The history of the concept is described in chapter
2 of Murray (1988).
Turning now to the question of clustering, there follow some denitions, (2)
to (7).
(2) A cluster is an uninterrupted sequence of cardinality greater than one.
Mathematians would undoubtedly let the cardinality begin with zero, i.e., they
would admit empty clusters and unit clusters. But in everyday usage a cluster
of objects contains at least two objects. The Oxford English Dictionary ex-
presses that much by dening a cluster as a collection of things. Indeed, we
would not, except perhaps jokingly, call a single painting, or no painting at all,
an art collection.
(3) A consonant cluster is a cluster of marginal speech sounds (i.e., a cluster
of speech sounds not interrupted by a nuclear speech sound).
With C for marginal speech sounds and V for nuclear speech sounds, and with
$ (or a period, .) for a syllable boundary, CC, C$C, CCC, C$CC, CC$C etc.
are consonant clusters, CVC, CV$C, CVCC, CCVCC, CV$CC etc. are not.
(4) A head cluster is a consonant cluster entirely within a syllable head.
(5) A coda cluster is a consonant cluster entirely within a syllable coda.
(6) An intersyllabic cluster is a consonant cluster containing both coda and
head speech sounds.
C$C, CC$C, C$CC etc. are intersyllabic clusters.
14 Theo Vennemann
(7) A contact cluster is an intersyllabic cluster of cardinality two.

C$C clusters within VC$CV, VCC$CV, VC$CCV, VCC$CCV etc. are contact
clusters.
2. What is consonant cluster complexity?
Phonologists have gathered a lot of information on consonant clusters and

their structural complexity, and have formulated a number of generalizations.
These are well-founded, inasmuch as they nd support in the observation of
numerous language systems, in which always the less complex structures are
favored over the more complex, in the sense that the occurrence of complex
structures almost always implies the occurrence of the less complex ones
on a given structural parameter. They also nd support in the observation of
language change, in which always the more complex structures are eliminated
before the less complex ones on the same parameter of complexity (cf. Venne-
mann 1989).
For example, if a language has consonant clusters of three, it also has con-
sonant clusters of two, but not conversely. Or more generally, cf. (8):
(8) If a language has consonant clusters of cardinality n (n > 2), it also has
consonant clusters of cardinality n 1.
No phonologist will doubt this generalization, which incidentally can be
derived from the various preference laws in Vennemann (1988). (Those Laws
relevant to my topic are cited in the Appendix below.) Since the Head Law
says in part (a) that a syllable head is the more preferred the closer its cardinality
is to one, and the Coda Law says that a syllable coda is the more preferred the
closer its cardinality is to zero, it follows that clusters are everywhere dis-
preferred, but the more so the greater their cardinality. Changes reducing the
cardinality of clusters can be found in many languages, and for some lan-
guages we know that the maximal cardinality of certain clusters has decreased
in historical times, for example that of head clusters in Korean and Pali. In
the early history of English there have been sporadic attempts at reducing tri-
consonantal word-initial syllable heads, for example in the words speak
(German sprechen) and shut (German schlieen, Germanic +skleutan, Lat.
claudere), cf. Vennemann 2000. But these changes have not become general,
as shown by spring, split, strand etc.
Can we perhaps use the preference laws to dene the structural complexity
of consonant clusters? It depends on what we want structural complexity of
consonant clusters to mean. If some measure is intended by which any two
clusters can be compared and judged more or less complex, or even some
numerical scale or measure for the structural complexity for consonant clusters,
then phonology cannot help. But we can do two things below that level of
generality. First, we can compare any two consonant clusters in terms of
structural complexity that are on one and the same quality scale of one of the
preference laws, viz. the Head Law, the Coda Law, and the Contact Law,
cf. (9).
(9) The structural complexity of consonant cluster A is greater than that of
consonant cluster B if B is more preferred than A in terms of one of the
preference laws.
Since the preference laws for syllable structure refer to structural aspects of
syllables, it follows that cluster complexity is structure-dependent. For exam-
ple, kl is less complex than lk in syllable heads but more complex in syllable
codas. This is recognized in (9) by relativizing complexity comparisons to a
particular preference law, in this case either to the Head Law or to the Coda
Law. It makes no sense, in this framework, to ask which of the two clusters,
kl or lk, is less complex an sich, i.e. without such structural relativization.
Second, we can say what contributes to the structural complexity of con-
sonant clusters, if we correlate this concept with that of linguistic quality in
terms of preference, i.e. of graded naturalness, cf. (10).
(10) Every property that makes a consonant cluster less preferred relative to
some other consonant cluster contributes to the structural complexity of
the given consonant cluster.
3. Consonant cluster complexity and the Head Law
Let us illustrate (9) above with a straightforward example that everyone knows.
When initial consonant clusters of a plosive and a sonorant are eliminated
eliminated because not only clusters of cardinality greater than two are com-
plex but all clusters are, only single consonants are good there is an order
that may not be broken. Thus, in English, on the partial scale in (11),
(11) English: *kn- ?kl- kr-
||| increasing quality of head clusters
all three clusters existed in Old English, as they do in Contemporary German.
In Contemporary English, the worst of these clusters is gone German Knie is
English knee, where the k- is still spelled but no longer spoken, nor is it speak-
able; the cluster is ungrammatical as a word-initial head cluster. The next on
16 Theo Vennemann
the quality scale, kl-, is unstable in some dialects, t or a glottal stop or some-
thing else, barely audible, being spoken instead of k (cf. Luick 1914: 801,
802, Fisiak 1980, and Lutz 1991: 251254 with further references). The clus-
ter kr- is intact everywhere. As phonologists we can explain this by reference
to the Head Law, part (c). If one compares (11) to (1) farther up, it becomes
obvious that the sonorants in the clusters are arranged along the scale of
Consonantal Strength. Part (c) of the Head Law, the preference law for the
structure of complex syllable heads, says that a head cluster is the more pre-
ferred the sharper the Consonantal Strength drops from the rst head speech
sound to the next. And as one can see in (11.a),
(11.a) k n l r
|||| decreasing Consonantal Strength
the drop from k to n is smallest, which means kn- is most disfavored. Many
phonologists see this as an explanation for the loss of kn-: The English devel-
opment instantiates a universal preference law. This preference law in turn is a
generalization gained by many phonologists looking at the structure and
changes in many languages (see for example Greenberg 1978).
Looking at the quality scale for initial consonant clusters of plosive and
sonorant in more general terms cf. the scale in (12)
(12) A quality scale for CC heads with plosives (P) as onset and plosives
(P), fricatives (F), nasals (N), lateral liquids l, central liquids r, and
semivowels V on the slope)
PP- PF- PN- Pl- Pr- PV -
||||||
we see in (13) to (19) how nicely languages range from instantiating the full
scale to no such head clusters at all.
(13) Classical Greek
||||||
+ + + + + (+)
(14) Contemporary Greek, German
||||||
+ + + + +
(15) Old and Middle English

||||||
+ + + +
(16) Contemporary English, Classical Latin
||||||
+ + +
(17) Italian, Portuguese (but there are new words with Pl- through borrow-
ing, for example Ital. blu blue)
||||||
+ +
(18) Korean
||||||
+
(19) Tahitian
||||||

These scales are historical snapshots, so to speak. They may change any time,
and they do. New clusters may originate in a language through copation pro-
cesses (syncope and apocope, see section 7 below), as happened massively in
the history of Polish (cf. Rocho 2000: chapter 2). New clusters may also be
transported into a language with borrowed words. In Basque, native words of
any antiquity never contain initial clusters (Trask 1997: 163). Older loan-
words were made to abide by this constraint, for example Latin placet,
planum, ammam, orem, gloriam Basque laket it is pleasing, lau
plane, lama ame, lore ower, loria glory. But in recent loanwords
from Romance, plosive-liquid clusters are tolerated in both initial and medial
position (Trask 1997: 194): Spanish plaza, plomo, precio, trono, clase,
gloricar Basque plasa town square, plomu lead, presio price, tronu
throne, klasa class, glorika glorify. In German, the cluster sk was elimi-
nated by a kind of context-free consonantal monophthongization, sk > > ,
18 Theo Vennemann
with degeminating rst in syllable heads and codas, later generally: scf,
skf- > Schaf [ a:f ] sheep, wascan > waschen [van] to wash, tisc,
tisk- > Tisch [t ] table. The same cluster was soon reintroduced in loan-
words: Skat (a card game, < Ital. scarto discarded playing cards), Skandal,
Skrupel, Sklave, Maske, grotesk.
4. How is consonant cluster complexity reduced?
There are numerous mechanisms which reduce the complexity of consonant

clusters, thereby eliminating clusters from their positions on the scale (always
the worst on the scale rst, as already mentioned). In early English, initial
clusters of velar plosives K (k, g) plus the nasal n were eliminated by deleting
the plosive. Greek word-initial clusters of two plosives were eliminated by
turning the rst plosive into a homorganic fricative ( pt > ft, kt > xt), thereby
enlarging the pre-existing FP set (with F = s, such as in spggos sponge,
stphanos wreath, skops a look-out-man, guardian):
(20) ptryx > ['fteriks] wing, ktz > ['tizo] (I) build
Apparently head clusters of two plosives are much dispreferred, even more
so than fricative prependices. This well-known systematic exception to the
preference for falling consonantal strength in syllable heads has been ascer-
tained for a number of languages and established as an implicational universal
by Morelli (1998, 1999): If a language has PP heads, it also has FP heads (as
well as PF heads), whereas the converse is not true.
The examples in (21) and (22) show the elimination of initial Pl- clusters
(also of -) in Portuguese, in inherited words through weakening of l into the
palatal glide with subsequent palatalization and assibilation of the anlaut
obstruent into ch- [-] with loss of the glide (cf. 21), in more recent acquisi-
tions from various sources through weakening of the l into r (cf. 22):
(21) Latin Portuguese
a. planus chao at
plaga chaga wound
platus chato at
plenus cheio full
plorare chorar deplore
plumbum chumbo lead
pluvia chuva rain
(b) clamare chamar to call

clavis chave key
Also:
(c) amma chama ame
agrare cheirar to smell
(22) Spanish Portuguese
(a) blanco, -a branco, -a white
(b) plancha prancha board
plato prato plate
plaza praa place
(c) clavo cravo nail
Also:
(d) aco, -a fraco, -a weak
auta frauta, auta ute
echa frecha, echa arrow
ota frota eet
ojo, -a froixo, -a slack
A noteworthy feature of the scale in (12), viz. in (13) through (19), is that
there occur only continuous ranges, never any gaps. That has to do with the
fact that clusters on this quality scale, just as objects on any other quality
scale, can only be changed by eliminating the most structurally complex con-
sonant cluster rst, then the next on the scale, and so on. Changing the range
in any other order is not possible, except in the rare case that a change on a
different parameter intersects the one in (12) at some point. For example, if in
a language with these clusters the phonemes /l/ and /r/ merge everywhere into
/r/, this change will eliminate the Pl- clusters, but not as an operation on
clusters but as an effect of an intersecting change.
5. How well do we understand consonant cluster complexity reduction?
Clearly in a manner of speaking we understand why and how the complexity

of consonant clusters is reduced. Phonologists do because they can interpret
consonant cluster changes as improvements on their quality scales; and phone-
ticians do because they can interpret those changes as articulatory simpli-
cations. But rather than continue emphasizing how well phonologists have
20 Theo Vennemann
organized their subject matter, I would like at this point to do just the opposite,
i.e., point out that in reality we do not really understand how complexity
problems of this sort are solved in any given case. Not only can we not predict
whether or when a complexity problem comes under attack, we also cannot
predict which of several possible solutions to the problem will be chosen,
so to speak. For example, we understand perfectly that a head cluster C1C2-
that is dispreferred according to the Head Law, part (c), is structurally complex
and therefore likely to come under attack. But whether the problem is resolved
by deleting the onset consonant C1 as too weak or the slope consonant C2
as too strong, or by manipulating the strength of one of the two, namely by
strengthening C1 or by weakening C2, and with what result, or whether a
vowel will be inserted to break up the cluster and achieve a nice C1V.C2V
sequence, or whether the cluster will be partially removed from the head posi-
tion by prosthesis and heterosyllabized as a medial cluster, VC1.C2V, we do
not yet know. All of these measures are on record for various languages, see
the partial illustration in (23) to (31).
(23) Kn- > n- in English
(24.a) Cl- > Cr- in Portuguese, see (22) above
(24.b) Cl- > Ci - in Italian
(25.a) wl- > l- in English, German, Old Norse (Lutz 1997)
(25.b) wl- > bl- in English, German (Lutz 1997), also in Classical Greek
(25.c) wl- > - in German dialect (Lutz 1997)
(25.d) wr- > r- in Scandinavian, English, German dialects (Lutz 1997)
(25.e) wr- > br- in English, German (Lutz 1997)
(26.a) n- > n- in almost all of Germanic (Lutz 1997)
(26.b) n- > gn-, kn- in Scandinavian dialects (Lutz 1997)
(26.c) n- > sn- in Swedish (Lutz 1997)
(27.a) l- > l- in almost all of Germanic (Lutz 1997)
(27.b) r- > r- in almost all of Germanic (Lutz 1997)
(28.a) w- > w- in almost all of Germanic (Lutz 1997)
(28.b) w- > kw-/kv- in Scandinavian (Lutz 1997)
(29.a) fn- > n- in almost all of Germanic (Lutz 1997)
(29.b) fn- > sn- in English (Lutz 1997)

(29.c) fn- > p f n- in Upper German dialect (Lutz 1997)
(30.a) sn- > sin-, kl- > kil-, ml- > mil-, ry- > riy- etc. in Pali
(Murray 1982)
(30.b) CC- > C .C- in Biblical Hebrew
(31.a) sC- > es.C- in Spanish
(31.b) sm- > es.m-, gd- > eg.d-, etc. in Phoenician
(Krahmalkow 2001: 32)
For syllable contact changes one may read Murray and Vennemann (1982). An
entire catalogue of types of syllable contact change, with exemplications for
every one of them, may be found in Vennemann (1988: 5055). The catalogue
is reproduced below in the Appendix.
6. On understanding phonological changes
It was said in section 5 that phonologists do not understand certain things

happening in phonological systems. If understand is used in the sense of
be able to explain, then this holds true for everything going on in phonol-
ogical systems. Phonologists describe sound systems and their changes and
generalize over them, but they cannot explain them. Only phoneticians can
explain certain aspects of phonological systems and their changes (and not all
of them either), by establishing phonetic correlates of speech sounds in context
and generalizing over those correlates. This is quite typical of phonology:
Phonological descriptions are pure syntaxes phonotaxes, so to speak, which
describe their objects but do not explain them. It is also true of cross-linguistic
phonological universals, including phonological preferences: They are syntactic
generalizations phonotactic generalizations which describe the phonological
regularities of the worlds languages but do not explain them. To be sure, they
explain specic statements by deriving them from more general ones within a
theory, either of a particular language system or of all language systems. But
why the generalization is valid, or the generalization which that generalization
derives from, etc. in short why the axioms of the theory are valid we do not
know. We have to ask specialists in other disciplines, especially phoneticians.
Or we have to learn enough phonetics etc. to know by ourselves. But then we
know it as would-be phoneticians etc., not as phonologists. Phoneticians in
particular are the semanticists in the endeavor to understand the sound systems
22 Theo Vennemann
of the worlds languages. They connect the phonologists phonotaxes to

objects and events in the real world.
This is the situation as a philosopher of science would describe it. In actual
fact, phonologists have already adopted some basic universal phonetic knowl-
edge from the phoneticians, socio-phonological knowledge from the socio-
linguists, etc. When they speak of fricatives and labials and nasals and voiced
sounds etc., rather than labeling their distribution classes with abstract names,
they already mix aspects of the phonetic semantics of their descriptions into
the phonotaxis. That is all right, as long as they know what they are doing. It
is similar to the syntacticians naming certain distribution classes nouns, others
verbs, yet others sentence adverbs etc. It incorporates aspects of the semantics
of the syntactic descriptions and cross-linguistic generalizations into the syn-
tactic descriptions themselves. This reaching into the other domain to nd
suggestive labels for the most general and recurring classes of objects in their
own domain facilitates communication, both with oneself and with others.
7. Slope metathesis as consonant cluster complexity reduction
The following are some tricky cases of sound change which in times before
phonologists thinking in terms of graded naturalness, or preferences, developed
were simply dubbed metatheses at a distance. Let us look at (32).
(32) Lat. periculum > Span. peligro danger : r - l > l - r
We see r and l changing places, a clear case of metathesis, if there ever was
one. How do we explain it? Do r and l simply exchange position in Spanish?
Certainly not, because the change does not always happen, not even in words
of the same rhythmic structure as peligro, cf. (33).
(33) Lat. alacrem > alegre lively, merry : l - r > idem
So is (32) a simple case of confusion? Certainly not, see (34).
(34) Lat. miraculum > milagro miracle : r - l > l - r
Lat. parabola > palabra word : r - l > l - r
(32) and (34) apparently follow a rule. Is the rule then to change r - l into l - r
but not conversely? Not either, cf. (35).
(35) Lat. aprilem > abril April : r - l > idem
So both l - r and r - l may remain unchanged, and the question is still why r - l
metathesizes precisely in the environment set up by the group in (32) and (34),
and there unexceptionably.
The answer is not trivial. It follows from a careful analysis of consonant

cluster complexity, viz. of the consonant clusters resulting from the syncope
that changes the word-nal sequence -PVlV into -PlV. The resulting head
cluster of obstruent plus liquid is improved i.e., its structural complexity is
diminished by reducing the Consonantal Strength of the second cluster
element, namely replacing the lateral by the central liquid, cf. (36).
(36) periculum > periglo > peligro : gl > gr
miraculum > miraglo > milagro : gl > gr
parabola > parabla > palabra : bl > br
And in order to avoid the doubling of r in two successive syllable heads, the
rst of the two liquids is dissimilatorily replaced by the lateral liquid removed
from the slope, cf. (37).
(37) periglo > *perigro, !peligro dissim. tendency: r - Cr > l - Cr
miraglo > *miragro, milagro dissim. tendency: r - Cr > l - Cr
!
parabla > *parabra, !palabra dissim. tendency: r - Cr > l - Cr

This process is named slope metathesis (in Vennemann 1988). That may be
a nice descriptive term but it covers up the motivating factors: the reduction of
structural consonant cluster complexity, the dissimilatory power of syllable
head sequences, and the desire to preserve as much as possible of the original
phonic substance of the word in order to retain its phonic identity and
recognizability.
Since speakers apply such subtle mechanisms as slope metathesis to reduce
cluster complexity one may ask how and why consonant clusters arise in
languages in the rst place. There are several mechanisms, some operating
outside phonology such as loanword adaptation, the univerbation of con-
stituents in syntax and composition, and the addition of afxes to stems in
morphology, others operating inside phonology. The commonest among the
latter are the copations, namely syncope which of necessity creates new clusters
(as in the examples discussed in this section) or complexies existing clusters,
and apocope which may transform intersyllabic clusters into the least preferred
kind of cluster, coda clusters:
syncope: -V.CV.CV- > -VC.CV- or -V.CCV-

-VC.CV.CV- > -VCC.CV- or -VC.CCV-
-V.CV.CCV- > -VCC.CV- or -VC.CCV-
etc.
24 Theo Vennemann
apocope: -VC.CV > -VCC

-VC.CCV > -VCCC
-VCC.CV > -VCCC
etc.
The only copation process that does not complexify but may even improve
syllable structure is procope (apheresis), which eliminates unaccented initial
syllables, especially naked ones.
Needless to say the operations just illustrated are not motivated by syllable
structure preferences. They derive from lexical, syntactic, morphological, and
rhythmic processes that are motivated by their own preferences; the syllable
structure complexications they may cause are merely incidental to their out-
come, collateral damage, so to speak. All language changes are local and
follow their own preferential parameters; structures on other parameters may
thereby be affected negatively differentially because they may interfere and
curb the primary change where the damage would be unacceptable.
The copations, as is well known, arise in informal, especially rapid speech.
They evidence a preference for brevity, favoring short words over long words,
the measure of shortness being the number of syllables: Copations may be
dened as processes reducing the number of syllables of an expression by
one. They affect the lexicon when their results stabilize in frequent use and in
rst language acquisition. Of course, this stability is only temporary: To the
extent that the results are consonant clusters they are from this incipient
moment onward subject to reductive pressures.
8. Slope displacement as consonant cluster complexity reduction
As the examples in section 7 show, structural complexity with regard to con-

sonant clusters is a very complex concept. It certainly involves more than
the structure of the cluster itself. It has long been known that what makes a
cluster complex in a syllable head may make it simple in a syllable coda and
conversely, head and coda being to some extent mirror images of each other,
see (38).
(38) $trV rather simple, Vtr $ very complex (Vrt $ less complex)
But whether a cluster is more or less complex depends not only on its position
in the syllable but also on the position of that syllable in larger structures,
especially the word. Please look at (39).
(39) Slope displacement (Vennemann 1988, 1997; examples from Rohlfs

1972: 322323, cf. Spanish examples in Lipski 1992)
(39) Slope displacement is the transposition of a speech sound from the
slope of one syllable to the corresponding slope of a syllable in the
neighborhood.
(39) [Lat. catedra > VLat. (Pompeii) catecra >] + cadegra > cadrega
chair (Lombardy)
In the example the r is transposed from the g (gr-) to the d (dr-). The
Lombards had a gr cluster before the metathesis, they have a dr cluster after
the metathesis. What have they gained? Where is the structural simplication,
the complexity reduction?
The difference is that before the change the cluster stands in an unstressed
syllable, after the change, in a stressed syllable, cf. (39.a).
(39.a) +ca.'de.gra > ca.'dre.ga
This is in harmony with the Stressed Syllable Law (40), which is part of the
General Syllabication Law, i.e. (109) in the excerpts from Vennemann 1988
in the Appendix below.
(40) The Stressed Syllable Law: All syllabic complexities are less dis-
favored in stressed syllables than in unstressed syllables (Vennemann
1988: 58, 1997: 318).
See the additional examples in (39.b).
(39.b) ca.'pes.tro > ca.'pres.to rope (Old Italian)
ot.'to.bre > at.'tru. fu October (Lucania, Campania)
in.'te.gro > in.'treg complete (Milanese)
A supporting simplifying factor may have been that the cluster occurs earlier
in the word after the change, because there is evidence for the law in (41) and
its specialization (41.a), cf. Vennemann 1997: 318.
(41).a The Early Syllable Law: All syllabic complexities are less
disfavored the earlier they occur within the word.
(41.a) The First Syllable Law: All syllabic complexities are less disfavored
in rst syllables than in later syllables.
See the example in (42).
26 Theo Vennemann
(42) ca.'pes.tro > cra.'pes.tu rope (Calabria) cf. (39.b)

The First Syllable Law may even win against the Stressed Syllable Law, as
(42) shows. It wins automatically, so to speak, if a resulting cluster would not
be allowed in the language on independent grounds, see (43):
(43) .'nes.tra > fri.'nes.ta window (Calabria) *.'nres.ta
or the resulting structure would lose the cluster effect, see (44):
(44) te.'a.tro > tri.'a.tu theatre (Sicilian) ?ti.'ra.tu
But the First Syllable Law is even prone to drain the Stressed Syllable Law,
see (45).
(45) pa.'dro.ne > pra.'do.ne lord, employer (Tuscan dialects)

com.'pra.re > crom.'pa.re to buy (Tuscan dialects)
dot.'tri.na > drot.'ti.na doctrin (Tuscan dialects)
cas.'tra.to > cras.'ta.o castrated (Old Ligurian)
By far the most numerous cases, as is to be expected, are those in which both
Laws work together, i.e. where the rst syllable of the word is stressed, see
(46).
(46) 'den.tro > 'dren.to within (written language)

'stu.pro > 'stru.po delement (written language)
'fab.bro > 'frab.bo smith (Tuscan dialects, also Umbria, Lazio)
've.tro > 'vre.to glass (Tuscan dialects)
'ca.pra > 'cra.pa goat (widespread in dialects of the south)
'fb.bri.ca > 'frb.bi.ca factory (widespread in dialects of the south)
etc., etc.
9. Conclusion
In the preceding sections of this paper it has been shown what structural com-
plexity of consonant clusters and what change especially reduction of
consonant cluster complexity may mean in phonology. The phoneticians will
clarify and illustrate these terms in their own language. Since it counts as
the hallmark of a good scientic approach to be compatible with approaches
in neighboring disciplines, and since phonetics is the closest neighbor of
phonology, relating to it much as semantics does to syntax, it is to be hoped

that the results of phonetic research in this domain t together with the laws
and analyses discussed above, in the sense that they may be understood as
interpreting the phonotactic, i.e. phono-syntactic descriptions by relating them
to objects and processes in the real world. If they do, we are on a successful
course in both disciplines. If they do not, corrections become necessary,
perhaps even a different conceptualization of the relationship between the two
disciplines.
Appendix
Excerpts from Vennemann 1988. The laws are also cited and illustrated in
Restle and Vennemann 2001. Numbers refer to pages in Vennemann 1988,
except for the Early Syllable Law and the First Syllable Law where they refer
to Vennemann 1997.
Head Law (6)

A syllable head is the more preferred: (a) the closer the number of speech
sounds in the head is to one, (b) the greater the Consonantal Strength value
of its onset, and (c) the more sharply the Consonantal Strength drops from
the onset toward the Consonantal Strength of the following syllable nucleus.
Coda Law (25)

A syllable coda is the more preferred: (a) the smaller the number of speech
sounds in the coda, (b) the less the Consonantal Strength of its offset, and (c)
the more sharply the Consonantal Strength drops from the offset toward the
Consonantal Strength of the preceding syllable nucleus.
Nucleus Law (42)

A nucleus is the more preferred: (a) the steadier its speech sound, and (b) the
less the Consonantal Strength of its speech sound.
Contact Law (67)

A syllable contact A$B is the more preferred, the less the Consonantal
Strength of the offset A and the greater the Consonantal Strength of the onset
B; more precisely the greater the characteristic difference CS(B) CS(A)
between the Consonantal Strength of B and that of A.
28 Theo Vennemann
General Syllabication Law (109)

A syllabication is the more preferred: (a) the better the resulting syllable con-
tact is and (b) the better that syllable contact is embedded.
Here better is short for more preferred [= less complex] in terms of the
Syllable Contact Law. The quality of embedding is dened in terms of the
Consonantal Strength of the environment of the contact and the distribution
of stress on both sides of the contact; cf. (108) in Vennemann 1988.
The Stressed Syllable Law (100)

All syllabic complexities are less disfavored in stressed syllables than in
unstressed syllables.
The Early Syllable Law (318)

All syllabic complexities are less disfavored the earlier they occur within the
word.
The First Syllable Law (318)

All syllabic complexities are less disfavored in rst syllables than in later
syllables.
Types of Syllable Contact Change (87)

(1) Tautosyllabication: A.B > .AB
(2) Gemination: A.B > A.AB
(3) Calibration
(a) Coda weakening: A.B > C.B, where C is weaker than A
(b) Head strengthening: A.B > A.C, where C is stronger than B
(4) Contact epenthesis: A.B > A.CB, where C is stronger than A
(5) Strength assimilation
(a) regressive: A.B > C.B, where the Consonantal Strength of C is less
than that of A and greater than or equal to that of B
(b) progressive: A.B > A.C, where the Consonantal Strength of C is
less than that of B and greater than or equal to that of A
(6) Contact anaptyxis: A.B > AV.B, where V is a vowel
(7) Contact metathesis: A.B > B.A
References
Byrd, Dani
1996 A phase window framework for articulatory timing. Phonology 13:
139169.
Browman, C. P., and Louis Goldstein
1986 Towards an articulatory phonology. Phonology Yearbook 3: 219
252.
1989 Articulatory gestures as phonological units. Phonology 6: 201251.
1992 Articulatory phonology: An overview. Phonetica 49: 155180.
Fisiak, Jacek
1980 Was there a kl-, gl- > tl-, dl-change in Early Modern English?
Lingua Posnaniensis 23: 8790.
Greenberg, Joseph H.
1978 Some generalizations concerning initial and nal consonant clusters.
In: Joseph H. Greenberg (ed.), Universals of human language, 4
vols, vol. 1: Phonology, 243279. Stanford, California: Stanford
University Press.
Krahmalkov, Charles R.
2001 A Phoenician-Punic grammar (Handbook of Oriental Studies, Sec-
tion one: The Near and Middle East 54). Leiden: Brill.
Lipski, John M.
1992 Metathesis as template-matching: A case study from Spanish. Folia
Linguistica Historica 11 (1990 [1992]): 89104, and 12 (1991
[1992]): 127145.
Luick, Karl
19141940 Historische Grammatik der englischen Sprache, 2 vols. Leipzig:
Bernhard Tauchnitz. [Reprint Stuttgart: Bernhard Tauchnitz, 1964.]
Lutz, Angelika
1991 Phonotaktische gesteuerte Konsonantenvernderungen in der Ge-
schichte des Englischen (Linguistische Arbeiten 272). Tbingen:
Niemeyer.
Lutz, Angelika
1997 Lautwandel bei Wrtern mit imitatorischem oder lautsymbolischem
Charakter in den germanischen Sprachen. In: Kurt Gustav Goblirsch,
Martha Berryman Mayou and Marvin Taylor (eds.), Germanic studies
in honor of Anatoly Liberman, 439462. (NOWELE 31/32.) Odense:
Odense University Press.
Morelli, Frida
1998 Markedness relations and implicational universals in the typology of
onset obstruent clusters. Proceedings of the Annual Meeting of the
North Eastern Linguistic Society [NELS] 28, vol. 2. Available on the
Internet at http://ebookbrowse.com/roa-251-morelli-2-pdf-d6710926
(24 April 2011).
30 Theo Vennemann
Morelli, Frida
1999 The phonotactics and phonology of obstruent clusters in optimality
theory. Ph.D. Dissertation, University of Maryland at College Park.
Available on the Internet at http://roa.rutgers.edu/view.php3?id=432
(24 April 2011).
Murray, Robert W.
1982 Consonant cluster development in Pli. Folia Linguistica Historica
3: 163184.
Murray, Robert W.
1988 Phonological strength and Early Germanic syllable structure (Studies
in Theoretical Linguistics 1.) Munich: Wilhelm Fink.
Murray, Robert W., and Theo Vennemann
1982 Syllable contact change in Germanic, Greek, and Sidamo. Klagen-
furter Beitrge zur Sprachwissenschaft 8: 321349.
Restle, David, and Theo Vennemann
2001 Silbenstruktur. In: Martin Haspelmath, Ekkehard Knig, Wulf
Oesterreicher and Wolfgang Raible (eds.), Sprachtypologie und
sprachliche Universalien: Ein internationales Handbuch, II.1310
1336. (Handbcher zur Sprach- und Kommunikationswissenschaft
20.) 2 vols. Berlin: Walter de Gruyter.
Rocho, Marzena
2000 Optimality in complexity: The case of Polish consonant clusters.
(Studia Grammatica 48.) Berlin: Akademie-Verlag.
Rohlfs, Gerhard
1972 Historische Grammatik der italienischen Sprache und ihrer Mund-
arten. (Bibliotheca Romanica 5.) 3 vols. Vol. I: Lautlehre. 2nd
unchanged ed. [1st ed. 1949.] Bern: Francke.
Sievers, Eduard
1901 Grundzge der Phonetik zur Einfhrung in das Studium der Laut-
lehre der indogermanischen Sprachen. 5th ed. Leipzig: Breitkopf &
Hrtel. [Reprint Hildesheim: Georg Olms 1976.]
Tillmann, Hans G., with Phil Mansell
1980 Phonetik: Lautsprachliche Zeichen, Sprachsignale und lautsprachlicher
Kommunikationsproze. Stuttgart: Klett-Cotta.
Trask, R. Larry
1997 The history of Basque. London: Routledge.
Vennemann, Theo
1988 Preference laws for syllable structure and the explanation of sound
change: With special reference to German, Germanic, Italian, and
Latin. Berlin: Mouton de Gruyter.
Vennemann, Theo
1989 Language change as language improvement. In: Vincenzo Orioles
(ed.), Modelli esplicativi della diacronia linguistica: Atti del Con-
vegno della Societ Italiana di Glottologia, Pavia, 1517 settembre
1988, 1135. Pisa: Giardini Editori e Stampatori. [Reprinted in:
Charles Jones (ed.), Historical linguistics: Problems and perspec-

tives, 319344. London: Longman, 1993.]
Vennemann, Theo
1997 The development of reduplicating verbs in Germanic. In: Irmengard
Rauch and Gerald F. Carr (eds.), Insights in Germanic linguistics II:
Classic and contemporary, 297336. (Trends in Linguistics, Studies
and Monographs 94.) Berlin: Mouton de Gruyter.
Vennemann, Theo
2000 Triple-cluster reduction in Germanic: Etymology without sound
laws? Historische Sprachwissenschaft (Historical Linguistics) 113:
239258.
On the relations between [sonorant] and [voice]
Rina Kreitman
Abstract
In previous literature it has been reported that the features [sonorant] and [voice] are
closely related. Voicing has long been linked to the feature [sonorant] as one of its
phonetic correlates, since voicing is one of the attributes common to all sonorant con-
sonants. It has been suggested that the distribution of the feature [voice] in clusters can
be predicted from the behavior of the feature [sonorant]. If sonority reversed clusters
are prohibited, voicing reversals, a situation where voicing decreases within a cluster
pre-vocalically, should not be tolerated either (Lombardi 1991). Here, I report on a
cross-linguistic typological study of the distribution of these two features in word-
initial onset clusters and how they relate to one another. The different typological
patterning of the two features and their internal markedness imply that it is impossible
to predict the typological patterning of clusters in terms of one of these features based
on the other. A language can be of one type in terms of [sonorant] but of a different
type in terms of [voice]. The typology presented can further predict language type
shifts due to historical changes. The prediction is: no matter the stage the language is
in, it must become a type of language predicted by the typology.
1. Introduction
In this paper I explore the relationship between two phonological features:

[sonorant] and [voice].1 I focus on the typologies of biconsonantal word initial
onset clusters along these two dimensions. I begin section 2 by presenting the
typology of onset clusters in terms of the feature [sonorant]. In section 3, I
show that despite claims in the literature to the contrary, the rare cluster type
[+voice][voice] ([+v][v]) is empirically attested cross linguistically and after
establishing the existence of the [+v][v] cluster type, I present the typology
of the feature [voice] in clusters.
I further address claims in the literature that the patterning of onset clusters
in terms of [sonorant] on the one hand, and in terms of [voice] on the other,
are closely correlated (Lombardi 1991, Morelli 1999, Steriade 1997). My
1. Languages which are argued to rely on features other than [voice] to distinguish
between obstruents, were excluded from the survey as will become evident in
section 3.
34 Rina Kreitman
own ndings, to be reported here, do not support this position. Rather, I show
that the organization of onset clusters in terms of the feature [sonorant] follows
a different pattern from the organization of onset clusters in terms of the fea-
ture [voice]. I show that the claim that [+voice][voice] clusters are closely
correlated with SO clusters (Lombardi 1991) is untenable.
While it is possible that the two features [sonorant] and [voice] are closely
linked phonetically (Parker 2002, 2008), it is not immediately transparent that
they are mutually dependent. As will become evident from the typologies
presented here, the typological patternings of the two features are entirely
independent of each other and therefore, these two features cannot be reduced
to a single feature. Moreover, I show that the patterning of one feature does
not provide any clues about the typological patterning of the other feature.
Furthermore, the markedness relations of clusters in terms of the feature
[sonorant] are quite different from markedness relations in terms of the feature
[voice], which will become evident in the discussion in section 4. The typolo-
gies I present are a result of a cross linguistic survey, which included 63 lan-
guages from 22 language families.
The typologies presented here are based strictly on the phonological features
[sonorant] and [voice]. It is important to note that in this work I discuss the
feature [sonorant], which partitions the consonant set into two classes: the
class of obstruents and the class of sonorants. Following Zec (1995), I address
only the classes of obstruents and sonorants and do not address any further
distinctions within these classes. In other words, the phonological feature
[sonorant] is not equated to the commonly used property sonority, expressed
in terms of a scale. This paper does not address the further ne-grained dis-
tinctions found in more elaborate sonority scales or the behavior of such
sonority scales but rather, it explores the relationship of the feature [sonorant]
and the feature [voice].
2. Clustering of sonorants (S) and obstruents (O)
In word initial, bi-consonantal onset clusters there are four logical combinations
of obstruents (O) standing for [sonorant] consonants, and sonorants (S),
standing for [+sonorant] consonants. The four logical possibilities for combin-
ing obstruent (O) and sonorant (S) consonants in an onset cluster are as in (1):
(1) a. OS b. OO c. SS d. SO
In the obstruent (O) class only consonantal segments specied for [sonorant]
are included; this includes both stops and fricatives. Conversely, only segments
On the relations between [sonorant] and [voice] 35
specied for [+sonorant] are included in the sonorant (S) class. For the purpose
of this survey only, this latter group consisted of liquids and nasals. Glides
were excluded for reasons listed in (5).
Logically, a language can have any of the clusters in (1), or any combina-
tion of them, or none. A language that has none of the clusters listed in (1) is,
of course, a language that does not allow any consonantal clusters. We exam-
ine only those languages which allow at least one of the clusters listed in (1).
Given the cluster combinations in (1), a-priori there are fteen logical possibil-
ities for combining these clusters into groups of one to four cluster types.
Therefore, a-priori there are fteen logically possible language types, as in (2).
If a language L has only one of the onset clusters listed in (1), it can, a-priori,
be any one of them, as in (2a). If a language has two of the onset clusters in
(1), it can, a-priori be any of the sets listed in (2b). If a language has three of
the onset clusters in (1), it can have any of the sets listed in (2c). Finally, it is
logically possible for a language to have all four onset clusters listed in (1), as
in (2d). A language that has no onset clusters constitutes an empty group, { },
which is a sixteenth logically possible language type and is excluded from this
study.
(2) a. 1 cluster b. 2 clusters c. 3 clusters d. 4 clusters
{OS} {OS,OO} {OS,OO,SS} {OS,OO,SS,SO}
{OO} {OS,SS} {OS,OO,SO}
{SS} {OS,SO} {OS,SS,SO}
{SO} {OO,SS} {OO,SS,SO}
{OO,SO}
{SS,SO}
In sum, in (2) I list all fteen logically possible language types (excluding
the empty group). The question arises, which of the logically possible language
types in (2) are occurring language types. To address this, I conducted a cross-
linguistic survey of languages which allow word initial onset clusters. The
methodology of the survey is outlined in section 2.1 and the results of the
survey are presented in section 2.2.
2.1. The survey methodology

The cross-linguistic typological survey I conducted is based on 63 languages
from 22 language families. A complete list of the languages included in the
survey is provided in appendix I of this paper. The survey includes languages
which were included in Greenberg (1965), Levin (1985), Morelli (1999) and
Steriade (1982), as well as 24 languages that had not been included in any
36 Rina Kreitman
earlier cross-linguistic typological studies. My survey includes only those lan-

guages that have onset clusters, which is not the case with Greenbergs survey.
This automatically excluded Persian, for example, which is a language with no
onset clusters included in Greenbergs survey but not in this survey. Moreover,
when I consulted the sources for some of the languages included in Greenbergs,
Levins, Morellis and Steriades studies, I decided that some (e.g. those for
Eggon and Nisqually (Maddieson 1981)) did not contain enough information
to be safely included in this survey. That is, it was not clear if all the languages
fullled all the criteria listed in (3) and (5) below.
The survey relies on descriptive grammars and grammar books as well as
additional research material where available. Multiple sources were consulted,
and data from several sources compared, whenever such data were available.
The greatest challenge in this typological study was to distinguish between a
sequence of two consonants and a cluster. Therefore, a host of criteria were
assumed regarding the status of a sequence of consonants. The basic criterion
for including a language in the survey is whether it allows consonantal clusters
word initially. To be precise, the word initial consonant sequence CiCj is taken
to be an onset cluster if it does not contain a morpheme boundary or any inter-
vening phonological material as stated in (3):
(3) Onset Cluster Let Ci Cj be a word initial sequence of consonants.
The sequence CiCj is an onset cluster iff:
(i) There is no morpheme boundary between Ci and
Cj: (Ci and Cj are tauto-morphemic).
(ii) There is no segment Si such that CiSiCj
(there is no intervening material between CiCj).
(iii) CiCj are linked to the same syllable node.
It should be noted that all sequences which conform to (3), including

sequences of segments which violate the Sonority Sequencing Principle (Selkirk
1984), constitute regular onset clusters for the purposes of this survey. In this, I
depart from proposals in the literature (Levin 1985, Steriade 1982 among many
others) which grant a special status to clusters which violate the Sonority
Sequencing Principle, for example, by associating the rst member of a cluster
with declining sonority to prosodic levels higher than the syllable.
The Sonority Sequencing Principle (SSP) states the strong cross-linguistic
tendency for syllables to rise in sonority towards the peak and fall in sonority
towards the margins. The SSP as formulated in Selkirk (1984) is given in (4).
However, while (4) constitutes a strong cross-linguistic tendency, it is not
equally obeyed by all types of languages, as will be demonstrated in this work:
(4) Sonority Sequencing Principle (SSP)

In any syllable, there is a segment constituting a sonority peak that is
preceded and/or followed by a sequence of segments with progressively
decreasing sonority values. (Selkirk 1984: 116)
The structure of onsets and their role within the syllable has long been de-
bated in the literature (Clements and Keyser 1983, Clements 1990, Davis
1985, Gordon 1998, Hyman 1985, It 1989, Kahn 1976, McCarthy and Prince
1986, Selkirk 1982, Zec 1988, among others). The question whether sequences
which do not conform to the SSP can form a real cluster in the phonological
sense is highly controversial. Solutions in the form of sesqui-syllables, head-
less syllables, extrametrical material and appendices, amongst others, have
been proposed in the literature to account for deviant and non-compliant
onsets (Cho and King 2003, Everett and Everett 1984, Goedemans 1998,
Gussmann 1992, Levin 1985, Nepveu 1994, Rialland 1994, Steriade 1982,
Thomas 1992 among others, also see an elaborate discussion and summary in
Vaux 2004). However, this hotly debated and complex issue is outside the
scope of this paper. As mentioned above, I diverge from these proposals in
the literature by treating even sequences with sonority reversals as clusters.
A language was excluded from the survey if its clusters did not conform to
one (or more) of the conditions listed in (3). Moreover, any of the circumstances
in (5) would exclude a language from the survey:2
2. Languages were also excluded for technical reasons, for example, if sources of
data were incomplete or inconclusive. Some sources, for example, Matthews
(1955) for Dakota, and Hoff (1968) for Carib, do not make a clear distinction
between word initial and word medial clusters, which makes it impossible to dis-
tinguish them. Moreover, for Dakota, different grammars listed different possible
clusters. Also excluded were languages for which data from different sources
were inconsistent. One such example is Chukchee (Bogoras 1922, Kenstowitz
1981 and Levin 1985 among others). Some sources claim that Chukchee contains
initial clusters (Levin 1985 following Bogoras 1922) while others (Kenstowiz
1981) claim that clusters in Chukchee are broken by vowel epenthesis. Skorik
(1961) explains that in Chukchee in some words consonantal sequences can
appear either with or without a vowel word initially but when the same sequence
appears in an onset position word medially, it must appear with the vowel between
the two segments or with a preceding vowel, suggesting that consonant sequences
are not truly clusters underlyingly. This is conrmed in Asinovskiis (1991) acoustic
data.
38 Rina Kreitman
(5) Additional conditions and criteria for excluding languages from the
survey:
(5) ii(i) A language was excluded if it had only obstruent + glide clusters.
For example, Korean, which has obstruent + glide clusters such as
py and gw, was not included.3
(5) i(ii) Also excluded from the survey were languages with only homorganic
nasal + obstruent clusters such as mb and nd. For example, Babungo
(Schaub 1985) has only simplex onsets and pre-nasalised onsets and
no other clusters. The phonological status of pre-nasalised sequences
is not immediately transparent. Such sequences can be a cluster or
a pre-nasalised segment (Maddieson and Ladefoged 1993, Riehl
2008). Without more information about the phonological status of
these sequences, it is impossible to determine whether a specic
sequence is a cluster or simply a pre-nasalised unary segment.
Languages which have non-homorganic nasal-obstruent sequences
in addition to homorganic nasal-obstruent sequences were included
in the survey. For example, if a language has mb clusters but also
mt or mk clusters (Taba, Bowden 2001), then the language was
included in the survey but the homorganic clusters were excluded
(i.e. they were not counted as SO clusters since their underlying
status is not always transparent, and they may or may not be clus-
ters). The non-homorganic clusters were included in the survey.
(5) (iii) Also excluded were languages which have only h + obstruent or
+ obstruent clusters, or obstruent + h and obstruent + clusters
such as Comanche (Riggs 1949) since these may function as pre-
or post-aspiration or glottalization.4
In sum, the survey focuses on languages which allow bi-consonantal word
initial onset clusters. Some of the languages included in the survey, such
as Chatino (McKaughan 1954), Georgian (Butskhrikidze 2002), and Polish
(Sawicka 1974), to name a few, allow clusters longer than two consonants
but those clusters were not the focus of this survey.
3. Clusters with glides as the second member are not included in this survey. Surface
glides may have a different underlying status. They may be underlying glides that
surface as glides or they may be underlyingly vowels that surface as glides (Levi
2004, 2008). Due to the lack of transparency in the underlying status of glides,
clusters with glides were excluded from the survey all together.
4. Mazatec (Steriade 1994) and Temoayan Otomi (Andrews 1949) are examples of
languages which have mostly pre- and post-aspirated and pre- and post-glottalised
sequences as well as pre-nasalised sequences; therefore, they were excluded from
the survey all together.
2.2. Results of survey

According to the survey, of the fteen logically possible language types listed
in (2) only four emerge as occurring language types, as in (6):
(6) Type 1 {OS}
Type 2 {OS, OO}
Type 3 {OS, OO, SS}
Type 4 {OS, OO, SS, SO}
Evident from 6, summarized in Table (1), are the implicational relations
between the various clusters. If a language allows only one type of cluster
it is OS. If a language has OO clusters, it will also allow OS clusters. If a
language has an SS cluster, it will also have OO and OS clusters. And lastly,
if a language has SO clusters it will allow all other clusters: SS, OO and OS.
Table 1. Attested language types: feature [sonorant].
Type OS OO SS SO Language
Type 1 Z Basque, Wa
Type 2 Z Z Kutenai, Modern Hebrew
Type 3 Z Z Z Greek, Irish
Type 4 Z Z Z Z Georgian, Russian, Pashto
In sum, evident from Table (1) are the implicational relations captured in
(7). The implicational relations in (7) are all unidirectional and without excep-
tions in the languages of the survey. Next, I single out crucial asymmetries
evident in Table (1) and the implicational relations in (7).
(7) SO % SS % OO % OS
First, there is an asymmetry between the right and left edges of the implica-
tional relations. The presence of SO clusters implies the presence of all other
clusters while OS clusters are implied by all other clusters. This asymmetry is
expected given that SO is of falling sonority, that is, violates the SSP, while
OS has a rise in sonority, i.e. conforms to the SSP. Based on the SSP we
expect clusters with rising sonority to occur more frequently than clusters
with reversed sonority. It is important to note that in this work an increase
or a rise in sonority means an increase from a negative value of the feature
[sonorant] to a positive one. That is, there is an increase in sonority from an
40 Rina Kreitman
obstruent segment ([sonorant]) to a sonorant ([+sonorant]). Similarly, a de-

crease in sonority denotes a shift from a positive value of the feature [sonorant]
to a negative one, as in SO ([+sonorant][sonorant]) sequences. For the purpose
of the survey of the feature [sonorant] both nasals and liquids are treated as a
single class of sonorants specied [+sonorant] with no ner distinctions,
which are often found in more elaborate sonority scales. This mirrors the treat-
ment of stops and fricatives as a single class of obstruents [sonorant] with no
ner distinctions (Zec 1995).
Secondly, we do observe an asymmetry between OO and SS clusters. The
question that arises is: why is it that SS implies OO but neither OO implies SS
nor OO and SS symmetrically imply each other (*OO , SS)? This results in
{OS, OO} being an occurring language type but *{OS, SS} and *{OS, SS,
SO} being non-occurring language types. Given the SSP which demands a
rise in sonority, it is not immediately transparent why {OS, OO} is an occur-
ring language type but *{OS, SS} is not. Both OO and SS are of at sonority
so why is it that there is a language type which includes only OO clusters
(Type 2), but not a language type which includes only SS clusters? There
are several reasons which, I suggest, make OO clusters less marked than SS
clusters.
One possible reason is the acoustic salience of obstruents as opposed to
sonorants. Obstruents are perceptually more salient than sonorants, therefore
their combinations are also more salient. Ohala (1983: 193) notes: Obstruents,
especially those that involve a transient burst due to the rapid equalization of
an appreciable difference in air pressure, create more rapid spectral changes
and thus are able to carry more information and make more distinctive sounds
than non-obstruents. Thus, obstruents, due to their acoustic attributes, when
released, carry more information, especially in onset (or word initial) position
and are therefore easier to distinguish from non-obstruents. We may deduce
that their combinations are also more acoustically salient than combinations
of sonorants and are therefore perceptually more advantageous.
This might also explain another cross-linguistic observation made by
Lindblom and Maddieson (1988), which is also the second reason OO clusters
may be less marked than SS clusters. According to Lindblom and Maddieson
(1988), phonemic inventories of languages tend to have a distribution of
roughly 70% obstruents and 30% sonorants. This results in greater clustering
possibilities for obstruents than for sonorants simply because there are more
obstruents than sonorants. Therefore, the greater markedness of SS clusters
stems simply from the mathematical reality that there tend to be fewer sonorants
than obstruents in cross-linguistic phonemic inventories.
Finally, Kreitman (2008) presents the sub-typology of both OO clusters,

based on Morelli (1999), as well as SS clusters. She nds that while markedness
in the typology of OO clusters is based on manner of articulation, markedness
in SS clusters is a combination of both manner and place of articulation. In SS
clusters differences in place become crucial to reinforce their perceptual
salience. For obstruents, however, manner of articulation alone is sufcient to
distinguish between possible members of a cluster. The additional layer of
complication in the internal markedness of SS clusters may account for the
fact that they are more marked than OO clusters. For a full account of the
asymmetry between OO and SS clusters, see Kreitman (2006, 2008), where
liquids and nasals and stops and fricatives are extensively addressed separately
and the sub-typologies of OO and SS are presented in detail.
2.3. Distributional facts
Table 2. Cross-linguistic distribution of clusters: feature [sonorant].
OS OO SS SO
# of langs 63/63 54/63 32/63 19/63
% 100% 85.7% 50.8% 30%
Table (2) presents the distributional data regarding each cluster type. From
Table (2) it is evident that if a language allows a consonantal cluster word
initially it will allow an OS cluster. More surprising is the frequency of OO,
SS and SO cross-linguistically. First, 30% of the languages in the survey admit
one or more SO clusters. This number is quite signicant making SO clusters
much more common than previously assumed. They are not anomalies occur-
ring rarely; rather they occur cross-linguistically in languages as varied as
Russian (Indo-European) and Hua (Trans New-Guinea). Secondly, the asym-
metry between OO and SS clusters is quite robust with OO clusters being
more than one and a half times more common than SS clusters, although
both constitute sonority plateaus.
3. Voicing typology
In section 2 I presented a typology of word initial onset clusters based on the

feature [sonorant]. To test previous claims about the correlation between voic-
ing and sonority, we now turn to the typology of word initial biconsonantal
onset clusters based on the feature [voice]. Since only obstruent clusters are
42 Rina Kreitman
specied for [voice], only a subset of the clusters examined in section 2 will
be the focus of this section.
3.1. Voicing combinations

In word-initial, biconsonantal onset clusters there are four logical combina-
tions of voiced ([+v]) and voiceless ([v]) obstruents as in (8):
(8) a. [v][v] b. [+v][+v] c. [v][+v] d. [+v][v]
Previous studies of voicing in clusters (Lindblom 1983, Lombardi 1991,
1995, 1999, Wetzel and Mascar 2001, Wheeler 2005, among others) accept
(8ac) as possible clusters, but there are conicting reports in the literature
regarding cluster (8d). While Blevins (2003), Greenberg (1965), and Steriade
(1997) all report that (8d) is an attested cluster, Lombardi (1991, 1999) and
Lindblom (1983) categorically reject it from the set of occurring clusters. Let
us review these conicting claims in depth.
In phonological studies such as Lombardi (1991, 1999), clusters of type
(8ac) are assumed to be possible clusters but clusters of type (8d) are
excluded from the set of occurring clusters. In other words, the conguration
in (9), in which a voiced obstruent precedes a voiceless obstruent in pre-
nuclear position, is taken to be ill-formed:
(9) *Voiced Obstruent voiceless obstruent syllable nucleus
Lombardi claims that the prohibition against [+v][v] onset clusters is uni-
versal. According to her, voiced segments may occur only before a sonorant
segment, either a vowel or a sonorant consonant. Her argument continues that
a [+v][v] obstruent cluster cannot be an occurring cluster type because voice-
less segments cannot intervene between a voiced obstruent and a vowel. She
refers to the gure in (9) as a Universal Sonority Constraint, . . . an absolute
universal which no language can violate. (1991: 59). Moreover, Lombardi
correlates the prohibition in the gure in (9) with the prohibition on sonority
reversed clusters. For her, voicing reversals are comparable to SO clusters.
As we will see in the next section, this parallel is untenable.
Likewise, Lindblom (1983) claims, based on the principle of gestural
economy, that [+v][v] clusters should be excluded on phonetic grounds.
According to Lindblom, [+v][v] is an illicit structure for the following

reasons:
We should also mention here the absence of rapid intrasyllabic alternations of
inspiration and expiration. The only universally preferred airstream mechanism
appears to be expiratory. A similar no doubt energy-saving arrangement is observed
in the distribution of phonation types. . . Clusters do not allow *[+voiced C]
[voiced C]V initially, nor its mirror image nally. (Lindblom 1983: 240) [my
emphasis].
However, Greenbergs (1965) survey based on 104 languages found the

following statistics on voicing in initial clusters:
(10) (a) Voiceless + voiceless [v][v] 66.7%

(b) Voiced + voiced [+v][+v] 21.65%
(c) Voiceless + voiced [v][+v] 10.68%
(d) Voiced + voiceless [+v][v] 0.97%
From (10) it is clear that the majority, or two thirds (66.7%) of the clusters
in Greenbergs survey are sequences of voiceless obstruents [v][v]. All other
cluster types constitute the remaining third. Of these, almost 22% are [+v][+v]
and just under 11% are [v][+v] clusters. Under 1% of all clusters are [+v][v]
clusters. However, the numbers are somewhat misleading since Greenberg
does not separate obstruent clusters from sonorant clusters. A tn cluster, for
example, is considered a [v][+v] cluster. This skews the numbers of [v][+v]
clusters and [+v][+v] clusters, making it difcult to correctly decipher the sta-
tistical data.
Evident from this survey is that clusters in which both members are voice-
less are preferable to clusters with any other voicing combination. Mixed voic-
ing clusters are a great minority at just a little over 12% of all clusters but both
[v][+v] and [+v][v] clusters exist. However, while Greenberg accepts the
existence of [v][+v] clusters, he doubts the existence of [+v][v] clusters,
although his survey lists two languages, Bilaan and Khasi, for which obstruent
[+v][v] clusters have been reported. Since Greenbergs sources for Bilaan and
Khasi (Dean 1955 and Rabel 1961 respectively), presented no phonetic evi-
dence for [+v][v] obstruent clusters, Greenberg allows for the possibility
that the reported [+v][v] clusters are phonetically realised as [v][v];
clusters like bt reported for Khasi and bs reported for Bilaan, might actually
be phonetically realised as pt and ps respectively. Since Bilaan does not distin-
guish between b and p, and contains only b in its phonemic inventory, it is
possible that the cluster bs listed in the grammar is phonetically realised as
44 Rina Kreitman
ps. With no phonetic evidence for Bilaan, it is impossible to determine how

the cluster bs is realised.
Blevins (2003) and Steriade (1997) also mention the existence of [+v][v]
clusters. Both recognise the existence of these clusters in Khasi based on
descriptive grammars but do not pursue their phonetics. They use [+v][v]
clusters to argue for licensing by cue. That is, voicing distinctions are more
likely to occur in perceptually advantageous environments in which the
perception of voicing is enhanced. In an OO environment, cues for voicing
are rather poor and therefore voicing is less likely to occur in this environment
(particularly if voicing is contrastive within a cluster as in the case of [v][+v]
or [+v][v]). Steriade (1997: 7) argues that lack of perceptual cues in an OO
environment accounts for the rarity of languages with [+v][v] clusters word
initially. Although, both Blevins and Steriade examine the inuence of percep-
tual cues on the distribution of voicing in obstruents they do not pursue the
phonetic implementation of such clusters in languages that do have [+v][v]
clusters.
To summarise, we encounter conicting claims in the literature concerning
[+v][v] sequencing. On the one hand, based on grammatical descriptions,
Blevins (2003), Greenberg (1965, 1978) and Steriade (1997) document the
existence of [+v][v] sequences word initially. On the other hand, Lindblom
(1983) and Lombardi (1991, 1995) exclude [+v][v] word initial sequences
as permissible clusters on theoretical grounds, both in the phonology and in
the phonetics.
To address the conict between phonetic and phonological theories on the
one hand and scarce empirical evidence on the other, a cross-linguistic typo-
logical study was conducted to establish the distributional facts of voicing in
word initial onset obstruent clusters. The cross-linguistic typological study is
supported by acoustic evidence from at least three different languages, Khasi,
Tsou and Modern Hebrew.
3.2. Evidence for the [+v][v] cluster type
The languages included in the survey are the same languages used in the
survey outlined in section 2.1. However, while the earlier survey included
languages with clusters containing both obstruents and sonorants, the present
survey includes only languages with obstruent clusters. That is, only 54
languages of the 63 surveyed for the feature [sonorant] were surveyed for the
feature [voice]. Some languages such as German, Irish, Klamath and Welsh, to
name a few, although they do have OO clusters, were not included in this
survey, bringing the number of languages included in this survey to 47. In
these languages the distinction between orthographic p, t, k and b, d, g is
claimed to be based on the feature [spread glottis] (Iverson and Salmons 1995,
Jessen 2001, Jessen and Ringen 2002 among others). That is, they are claimed
to have a distinction between aspirated and unaspirated stops rather than
voiced and voiceless stops. For some of these languages (German), there are
conicting claims regarding the proper distinctive laryngeal feature. Since the
nature of the distinctive feature in these languages is controversial but is out-
side the scope of this work, these languages were excluded from the survey for
the feature [voice]. The methodology I employed is the same as outlined in
section 2.1. The languages included in this section are also listed in appendix I.
Results of the survey indicate that in reality clusters of the [+v][v] type do
occur albeit they are rare. Six languages are reported to contain such clusters
and three cases have been documented with supporting phonetic evidence in
the literature:
(i) Khasi in which, dk in dkar tortoise is distinct from tk in tkor-tkor
plump and tender. According to Henderson (1991), dissimilation of
voicing is a widespread feature in Khasi. However, few phonetic details
are available, and, unfortunately, in the case of the only instrumental
investigation (Henderson 1991 reproduced in Kreitman 2008, 2010) it
is not clear that the material was produced by a native speaker.
(ii) Tsou in which s is distinct from ps (Wright 1996 reproduced in Kreitman
2008, 2010);5
(iii) Modern Hebrew in which dk in dkalim palms is distinct from tk in
tkarim at tires and dg in dgalim ags (Kreitman 2008, 2010).
Figure 1 is a spectrogram of the word dkalim palm trees from Modern
Hebrew (Kreitman 2008). It is an illustrative sample which provides acoustic
phonetic evidence for the existence of [+v][v] clusters in addition to the evi-
dence available from Khasi and Tsou. A much wider range of utterances and
many more examples of the occurrence of [+v][v] clusters, can be found in
Kreitman (2008, 2010).
Given these facts, Lombardis cross-linguistic prohibition against [+v][v]
clusters and Lindbloms prediction that [+v][v] clusters cannot be produced,
have no empirical basis.
5. The phonological classication of implosives is not always transparent. In some

languages they pattern with obstruents but in others they do not. Based on the
phonological patterning of implosives in some languages, as well as articulatory
evidence from the production of implosives, it has been proposed that implosives
should not be treated as obstruents or as sonorants (Clements and Osu 2002).
According to Clements and Osu, implosives are specied [obstruent, sonorant].
In this paper, however, they are treated as obstruents.
46 Rina Kreitman
Figure 1. Modern Hebrew: dkalim palms (Kreitman 2008)
3.3. Typology of the feature [voice]

Now that we have established the existence of the [+v][v] cluster type
phonetically, we are ready to address the typology of the feature [voice] in
word initial onset clusters.
In (11) all the fteen logically possible language types that result from
all possible groupings of the clusters in (8) are listed. Of the fteen logically
possible combinations only 6 occurring language types emerge. Table (3)
summarises all occurring language types.
(11) (a) {[v][v]}
{[+v][+v]}
{[v][+v]}
{[+v][v]}
(b) {[v][v], [+v][+v]}
{[v][v], [v][+v]}
{[v][v], [+v][v]}
{[+v][+v], [v][+v]}
{[+v][+v], [+v][v]}
{[v][+v], [+v][v]}
(c) {[v][v], [+v][+v], [v][+v]}

{[v][v], [+v][+v], [+v][v]}
{[v][v], [v][+v], [+v][v]}
{[+v][+v], [v][+v], [+v][v]}
(d) {[v][v], [+v][+v], [v][+v], [+v][v]}
Table 3. Attested language type: feature [voice].
[v][v] [+v][+v] [v][+v] [+v][v] sample language

Type 1 Z Dutch, Kutenai
Type 2 Z Z Greek, Romanian
Type 3 Z Z Z Georgian
Type 4 Z Z Z Z Tsou, Khasi
Type 5 Z Z Biloxi, Camsa
Type 6 Z Z Z Bilaan, Amuesha
Clear implicational relations stated in (12) arise from Table (3):

(12) [+v][+v]
+
[+v][v] % [v][+v] % [v][v]
Only [v][v] clusters are implied by all other clusters and imply no other
cluster, making them the least marked cluster type.6 A language Type 1 has
only [v][v] clusters. Languages such as Dutch and Kutenai belong to this
language type.
The presence of a [+v][+v] implies the presence of a [v][v] cluster but no
cluster type implies the presence of a [+v][+v] cluster. A language such as
Greek, which contains only [+v][+v] and [v][v] clusters and no other cluster
type is a Type 2 language as in (13):
(13) [+v][+v]
+
[v][v]
6. This is expected based on articulatory phonetics because voicing is more marked

(less natural) in obstruents, particularly word initially, than in sonorants (Westbury
and Keating 1986).
48 Rina Kreitman
The presence of at least one varied voicing combination implies the presence
of a [v][v] cluster. But in a Type 3 language both, a [v][+v] cluster and a
[+v][+v] cluster are present as in Georgian. By implication a Type 3 language
also contains a [v][v] cluster, as in (14):
(14) [+v][+v]
+
[v][+v] % [v][v]
A Type 4 language has all possible voicing combinations as in (12). Lan-
guages which belong to this type include, Modern Hebrew, Tsou, Hua and
Khasi.7
A Type 5 language, however, contains only one cluster with varying voic-
ing and by implication also a [v][v] cluster as in (15) below. Languages
which belong to this type include Biloxi and Camsa.
(15) [v][+v] % [v][v]
A Type 6 language has both possible varying voicing clusters [v][+v] and
[+v][v] and therefore by implication also [v][v] clusters as in (16):
(16) [+v][v] % [v][+v] % [v][v]
A Type 6 language is typologically predicted on the basis of the implica-
tional relations in (12); in Table (3) this is exemplied by Bilaan and Amuesha.
The only available grammatical description of Bilaan (Dean 1955) lists [v][v],
[v][+v] and [+v][v] as occurring clusters, making Bilaan a Type 6 language.
However, as mentioned previously, with lack of phonetic evidence, the cases
of [+v][v] in Bilaan are suspect. Without further phonetic investigation it is
impossible to determine whether the [+v][v] clusters are realised as such in
Bilaan or whether some other phonetic properties are used to distinguish these
clusters.
7. Berber (Berber) and Moroccan Arabic (Semitic) may also be Type 4 languages in
terms of the feature [voice], since they allow [+v][v] clusters word initially. This
can, potentially, increase the number of Type 4 languages in terms of the feature
[voice] to 8 and the percentage of languages which permit [+v][v] clusters to
17%. However, these languages were not included in the survey for two reasons.
The rst reason for excluding these languages is because available sources did not
give exhaustive coverage of permissible clusters. The second reason these clusters
were excluded is because the syllabic status of the initial clusters in these
languages is controversial (Dell and Elmedlaoui 2002, Shaw et al., 2009 and
references therein).
From these implications we may draw the following conclusions: (i) if a

language has only one cluster with the same voicing it will be [v][v]; (ii) if
a language has a mixed voicing cluster it must have at least one cluster with
the same voicing; (iii) nothing implies the presence of [+v][+v] clusters;
but (iv) the presence of a [+v][+v] cluster implies the existence of a [v][v]
cluster; and does not imply the existence of any other cluster.
3.4. Distributional facts

Table (4) summarises the distributional facts regarding voicing combinations
in word initial obstruent clusters:
Table 4. Cross-linguistic distribution: feature [voice].
[v][v] [+v][+v] [v][+v] [+v][v]

# of langs 47/47 21/47 13/47 6/47
% 100% 44.6% 27.6% 12.7%
The numbers presented in Table (4) for the distribution of the various voic-
ing combinations in clusters differ quite signicantly from the numbers found
by Greenberg (1965), provided in (10). This is to be expected considering
Greenberg calculated the distribution of each cluster type out of the entire
set of cluster types while the calculation presented here shows how many
languages contain a certain cluster type out of the subset of obstruent clusters
only. Surprisingly, [+v][+v] cluster type is much more rare than initially ex-
pected. Conversely, the mixed cluster types are more common than initially
expected.
4. Comparing the two typologies
We are now in a position to compare the implicational relations for the two
typologies presented in sections 2 (for the feature [sonorant]) and in section 3
(for the subset of obstruents specied for the feature [voice]). The implica-
tional relations found for the feature [sonorant] are repeated in (17a) and the
implicational relations found for the feature [voice] are repeated in (17b).
(17) (a) Sonority implicational relations:
SO % SS % OO % OS
50 Rina Kreitman
(b) Voicing implicational relations:

[+v][+v]
+
[+v][v] % [v][+v] % [v][v]
For the feature [voice] in obstruents, the preferred cluster type, the cluster
that is implied by all other clusters, is [v][v], the one in which both seg-
ments are voiceless as in (12), repeated in (17b). Thus, in the least marked
cluster its members have the same voicing specication. For the feature
[sonorant] the least marked cluster, the one implied by all other clusters, is
OS as in (7) repeated in (17a). Thus in the least marked cluster, its members
have different specications for the feature [sonorant].
It is clear from (17) that there is no basis for correlating the role of the
feature [voice] and [sonorant] in onset clusters. In other words, Lombardis
proposal to treat [+v][v] as a sonority reversed cluster, comparable to SO, is
not supported by the place of these clusters in their respective typologies.
Even if we adopt Lombardis assumption that difference in voicing is com-
parable to difference in sonority, the least marked cluster in terms of [voice]
has no rise in voicing,8 while the least marked cluster in terms of [sonorant]
does. That is, the least marked cluster in terms of voicing ([voice]) is that in
which both members of the cluster have the same voicing specications (a
voicing plateau); the least marked cluster in terms of the feature [sonorant] is
that in which the sonority values of each member of the cluster are different
and the second member is more sonorous than the rst member. Clusters
which have a sonority plateau, that is, clusters in which both members of the
cluster have the same value for the feature [sonorant] are marked. Moreover,
while the typology based on the feature [sonorant] in (17a) yields SO as the
most marked cluster, the typology based on the feature [voice] does not yield
a single most marked cluster. Both [+v][+v] and [+v][v] are candidates for
this status.
5. Predictions about language type shifts
Historical changes in a languages cluster inventory can cause a language to

shift types. For example, a language which does not allow clusters at one
8. By rise in voicing I mean a progression of the value assigned to the feature

[voice] from a negative value to a positive one. That is, there must be a rise from
[v] to [+v] within the cluster for there to be a rise in voicing. Conversely, a
fall in voicing is represented by a regression from a positive value for the feature
[voice] to a negative one (from a [+v] to [v] within a cluster).
stage, but allows them at another stage, is said to shift types. Clusters may
become part of the grammar in several ways: borrowings, morphological
or phonological processes such as syncope. Predictions regarding language
type shifts follow from the implicational relations stated in (7) and (17a). A
language L1 of type T1, can change membership and become a member of
another type, T2, by changing the inventory of clusters allowed by the lan-
guages grammar. It follows from (7) that if a language has no clusters then
the rst cluster type it will achieve is OS. Thus, a language with no clusters
can shift to become a Type 1 language, i.e. a language with OS clusters.
Examples of languages that shifted types are West Greenlandic (Fortescue
1984) and Popoluca (Elson 1947). Both languages disallowed consonantal
clusters word initially at an earlier point in their history, and due to borrowing
(from Danish and Spanish respectively), have shifted to become Type 1 lan-
guages; both now allow OS clusters.
A language may also gain clusters through a process of vowel syncope. For
example, a vowel may be consistently deleted in the rst syllable of every
word. That could result in a language gaining all types of clusters at once and
becoming a Type 4 language. However, a language cannot gain only {OO} or
only {SS} clusters as languages with only {OO} or {SS} clusters are not
empirically attested and are therefore not part of the typology.
It is also possible for a language to lose clusters. Once again, it is predicted
that if a language loses one cluster type, it will lose the cluster type which
implies all other clusters. Thus, a language of Type 4, which allows reversed
sonority clusters, those that imply all other clusters, may disallow such clusters
and shift to become a Type 3 language.
The prediction is that no matter what stage the language is in, if it gains
or loses clusters, it must become a language type which is predicted by the
typology. A language will never gain only OO and SS clusters without having
OS clusters as well, because the set *{OO, SS} cannot belong to an occurring
language type.
6. Conclusion
While claims in the literature link the feature [sonorant] and the feature
[voice], it has been shown here that they may not be so closely correlated, at
least not typologically. This suggests that, although these two features may
interact in complex ways, they are not mutually dependent and their typologi-
cal patterning cannot be reduced to a single pattern. That is, the phonological
patterning of one of these features in clusters cannot be conjectured based on
52 Rina Kreitman
the other feature. The typological patterning of clusters based on the feature
[sonorant] does not provide any clues about the phonological patterning of
the feature [voice] in clusters. A language can be of one type in regards to
one of these features, and another type in regards to the other. For example,
Russian exhibits all possible clusters of the feature [sonorant], OS, OO, SS
and SO, making it a Type 4 language in terms of the feature [sonorant], yet
only two combinations of the feature [voice] are permitted, [v][v] and [+v]
[+v], making it a Type 2 language in terms of the feature [voice]. Russian,
thus, is elaborate in terms of the combinations it allows word initially for the
feature [sonorant] but relatively simple in terms of the voicing combinations it
permits. Modern Hebrew is the opposite example. It only allows two cluster
types in terms of the feature [sonorant], OS and OO, making it a Type 2
language in terms of the feature [sonorant], but allows all possible voicing
combinations, [v][v], [+v][+v], [v][+v] and [+v][v], making it a Type 4
language in terms of the feature [voice]. Modern Hebrew is simple in terms
of the combinations it allows word initially for [sonorant] but is quite complex
in terms of the voicing combinations it permits. This suggests that typological
classication of languages based on either one of these features should be
explored independently.
Appendix I: Table of clusters by the [sonorant] and [voice] combinations
[sonorant] [voice]
Language
OS OO SS SO [vv] [+v+v] [v+v] [+vv]
Aguacatec Z Z Z Z Z Z
Aleut Z Z Z
Amuesha Z Z Z Z Z
Basque Z
Belarusian Z Z Z Z Z
Bilaan Z Z Z Z Z Z Z
Biloxi Z Z Z Z
Breton Z Z Z
Bulgarian Z Z Z Z Z
Cambodian Z Z Z Z Z Z
Camsa Z Z Z Z
Chami Z
Chatino Z Z Z Z Z Z13
Cornish Z Z Z
Czech Z Z Z Z Z Z
Danish Z Z 12
Dutch Z Z Z
Embara Catio Z
Frisian Z Z Z
Gaelic (Scots) Z Z Z 12
Georgian Z Z Z Z Z Z Z
German Z Z 12
Greek Z Z Z Z Z
Hebrew (Modern) Z Z Z Z Z Z
Hindi Z Z Z Z
Hixkaryana Z Z Z
54 Rina Kreitman
Appendix I: Continued
[sonorant] [voice]
Language
Hua Z Z Z Z Z Z Z Z
Hungarian10 Z Z Z Z9 Z
Icelandic Z Z 12
Inga Z Z Z Z Z
Irish Z Z Z 12
Khasi Z Z Z Z Z Z Z Z
Klamath Z Z Z Z 12
Kobon Z
Kutenai Z Z Z
Lithuanian Z Z Z Z
Macedonian Z Z Z Z Z
Manx Z Z Z Z
Mon (Burmese) Z
Norwegian Z Z Z
Pashto Z Z Z Z Z Z
Polish Z Z Z Z Z
Popoluca Z
Romani Z Z Z
Romanian Z Z Z Z Z
Russian Z Z Z Z Z Z
Serbian Z Z Z Z Z
Seri Z Z Z
Slovak Z Z Z Z Z Z
Slovenian Z Z Z Z Z
Sorbian (lower) Z Z Z Z Z Z
Sorbian (upper) Z Z Z Z Z Z
Appendix I: Continued
[sonorant] [voice]
Language
Spanish Z
Swedish Z Z Z
Taba Z Z Z Z Z
Totonac Z Z Z
Tsou Z Z Z Z Z Z14 Z Z
Ukrainian Z Z Z Z Z Z
Wa Z
Welsh Z Z 12
Yiddish Z Z Z Z Z Z
Zapotec (Isthmus)11 Z Z Z
Zoque Z
9. In Hungarian v is claimed to behave as a sonorant (Barka and Horvath 1978),

similarly to the way it behaves in Russian, therefore, clusters with v as the second
member of a cluster were not included.
10. Hungarian is listed as having only one SO cluster ng.
11. Mitla Zapotec shows the same patterns.
12. In Danish (Hansen 1967), Gaelic (both Scots and Irish), (Green 1997, Harbert
p.c), German (Iverson and Salmons 1995, Jessen 2001 among others), Icelandic
(Rgnvaldsson 1993), Welsh (Awbery 1984) and Klamath (Blevins p.c) stops are
claimed to be distinguished by aspiration and not by voicing. Therefore, none of
these languages were included in the survey for voicing.
13. Chatino may or may not have [v][+v], not entirely clear from source.
14. Tsou contains only one [+v][+v] cluster, zv, which may or may not be an obstruent
[+v][+v] cluster, depending on the status of v in Tsou and whether it behaves as a
sonorant. If v is a sonorant, then Tsou should be classied as a Type 5 language in
terms of the feature [voice]. If, however, v is an obstruent, then Tsou is classied
as a Type 6 language.
56 Rina Kreitman
Language database (An asterisk (*) next to the language name indicates that
the language was not included in the survey as it either did not contain any
clusters or did not conform to the conditions listed in (3) and (5)):
Aguacatec (Mayan) McArthur and McArthur 1956
Aleut (Eskimo-Aleut) Bergsland 1997
Amuesha (Arawakan) Fast 1953
Armenian* (Armenian, Indo-European) Werner 1962; Vaux 1998
Arabic* (Moroccan) Shaw, Gafos, Hoole and Zeroual 2009; Dell and Elmedlaoui
2002
Asheninka*(Arawakan) Dirks 1953
Basque (Basque) Hualde 1991
Babungo* (Niger-Congo) Schaub 1985
Belarusian (Slavic Indo-European) Sawicka 1974
Bengali*(Indo-Iranian, Indo-European) Ferguson and Chowdhury 1960
Beber* (Berber) Dell. and Elmedlaoui 2002
Bilaan (Austronesian) Dean and Dean 1955
Biloxi (Siouan) Einaudi 1976
Breton (Celtic, Indo-European) Ternes 1992
Bulgarian (Slavic Indo-European) Scatton 1984; Sawicka 1974
Burmese* (Sino-Tibetan) Sun 1986
Cambodian (Mon-Khmer) Nacaskul 1978
Camsa (language isolate) Howard 1967
Carib* (Carib) Hoff 1968
Chami (Choco) Gralow 1976
Chatino (Oto-Manguean) McKaughan 1954
Chukchee* (Chukotko-Kamchatkan) Asinovskii 1991; Bogoras 1922; Kenstowicz
1981; Levin 1985; Skorik 1961
Comanche* (Uto-Aztecan) Riggs 1949
Cornish (Celtic, Indo-European) George 1993
Cuicateco* (Oto-Manguean) Needham and David 1946
Czech (Slavic, Indo-European) Kuera 1961; Kuera and Monroe 1968
Dakota* (Siouan) Matthews 1955
Danish (Germanic, Indo-European) Diderichsen 1964; Hansen 1967
Dutch (Germanic, Indo-European) Booij 1995
Eggon* (Niger-Congo) Maddieson 1981
Embara Catio (Choco) Mortensen 1999
French* (Romance, Indo-European) Dell 1995
Frisian (Germanic, Indo-European) Cohen, Ebeling, Fokkema and van Holk 1961
Gaelic (Scots) (Celtic, Indo-European) Gillies 1993; Green 1997
Georgian (Kartvelian) Butskhrikidze 2002; Chitoran 1998; Chitoran 1999;
Chitoran, Goldstein and Byrd 2002; Gvarjaladze and Gvarjaladze 1974
German (Germanic, Indo-European) Wiese 1996
Greek (Greek, Indo-European) Eleftheriades 1985; Joseph and Philippaki-
Warburton 1987
Greenlandic* Fortescue 1984

Haida* (Language isolate) Sapir 1923
Hebrew (Modern) (Semitic) Berman 1997; Kreitman 2008
Hindi (Indo-Iranian, Indo-European) Gumperz 1958; Ohala 1983
Hixkaryana (Carib) Derbyshire 1985
Hua (Trans New-Guinea) Haiman 1980
Hungarian (Uralic, Fino-Ugric) Siptr and Trkenczy 2000
Icelandic (Germanic, Indo-European) Rgnvaldsson 1993
Inga (Quechuan) Levinsohn 1979
Irish (Celtic, Indo-European) Dochartaigh 1992; Green 1997
Kabardian* (North Caucasian) Colarusso 1992
Keresan* Spencer 1946
Khasi (Mon-Khmer) Henderson 1991; Nagaraja 1990; Rabel 1961
Klamath (Penutian) Barker 1964
Kobon (Trans New-Guinea) Davies 1980
Kutenai (Language isolate) Garvin 1948
Leti* (Austronesia) van Engelenhoven 1995
Lithuanian (Baltic, Indo-European) Ambrazas 1997
Macedonian (Slavic, Indo-European) Sawicka 1974
Manx (Celtic, Indo-European) Broderick 1993
Mon (Burmese) (Mon-Khmer) Huffman 1990
Mazatec* (Huautla) (Oto-Manguean) Pike and Pike 1947; Steriade 2004
Norwegian (Germanic, Indo-European) Ns 1965
Otomi* (Oto-Manguean) Andrews 1949
Pashto (Indo-Iranian, Indo-European) Penzl 1955
Polish (Slavic, Indo-European) Gussman 1992; Sawicka 1974
Popoluca (Oto-Manguean) Elson 1947
Roma* (Austronesian) Hajek and Bowden 1999; Steven 1991
Romani (Indo-Iranian, Indo-European) Ventzel 1983
Romanian (Romance, Indo-European) Agard 1958; Mallinson 1986
Russian (Slavic, Indo-European) Sawicka 1974
Serbian (Slavic, Indo-European) Hodge 1946; Sawicka 1974
Seri (Hokan) Marlett 1988
Sinhalese* (Indo-Iranian, Indo-European) Coates and da Silva 1960
Slovak (Slavic, Indo-European) Sawicka 1974
Slovenian (Slavic, Indo-European) Sawicka 1974
Sorbian (lower) (Slavic, Indo-European) Sawicka 1974
Sorbian (upper) (Slavic, Indo-European) Sawicka 1974
Spanish (Romance, Indo-European) Chavarria-Aguilar, O. L. 1951
Swedish (Germanic, Indo-European) Sigurd 1965
Taba (Austronesian) Bowden, J. 2001; Hajek and Bowden 1999
Totonac (Totonacan) Aschmann 1946; MacKay 1994; MacKay 1999
Tsou (Austronesian) Hsin 2000; Wright 1996
Ukrainian (Slavic, Indo-European) Rusanivskyi (ed.) 1986
Wa (Mon-Khmer) Watkins 2002
58 Rina Kreitman
Welsh (Celtic, Indo-European) Awbery 1984; Thomas 1992

Yiddish (Germanic, Indo-European) Jacobs 2005
Yuma* (Hokan) Halpern 1946
Zapotec (Isthmus and Mitla dialects) (Oto-Manguean) Briggs 1961; Marlett and
Pickett 1987
Zoque (Mixe-Zoque) Wonderly 1951
References
Agard, Frederick B.
1958 Structural sketch of Rumanian. Language, 34(1): 7127.
Ambrazas, Vytautas
1997 Lithuanian Grammar. Vilnius: Baltos lankos.
Andrews, Henrietta
1949 Phonemes and morphophonemes of Temoayan Otomi. International
Journal of American Linguistics, 15: 213222.
Aschmann, Herman P.
1946 Totonaco phonemes. International Journal of American Linguistics,
12: 3443.
Asinovskii, Aleksandr Semenovich
1991 Konsonantizm Chukotskogo Jazyka [Consonantism of the Chukchee
language]. Leningrad: Nauka. (In Russian).
Awbery, Gwenllian M.
1984 Phonotactic constraints in Welsh. In Martin J. Ball and Glyn E.
Jones (eds.), Welsh Phonology, Selected Readings, 65104. Cardiff:
University of Wales Press.
Ball, Martin and James Fife (eds.)
1993 The Celtic Languages. London: Routledge.
Barka, Malachi and Julia Horvath
1978 Voicing assimilation and the sonority hierarchy: Evidence from Rus-
sian, Hebrew and Hungarian. Linguistics, 212: 7788.
Barker, Muhammad A. R.
1964 Klamath Grammar. University of California Publications in Linguis-
tics 32. University of California Press.
Bat-El, Outi
1994 Stem modication and cluster transfer in Modern Hebrew. Natural
Language and Linguistic Theory, 12: 571596.
Bergsland, Knut
1997 Aleut Grammar: Unangam Tunuganaan Achixaasix. Fairbanks:
Alaska Native Language Center.
Berman, Ruth
1997 Modern Hebrew. In Robert Hetzron (ed.), The Semitic Languages,
312333. New York: Routledge.
Blevins, Juliette
2003 The independent nature of phonotactic constraints: An alternative
to syllable-based approaches. In Caroline Fry and Ruben van
de Vijver (eds.), The Syllable in Optimality Theory, 375404.
Cambridge: Cambridge University Press.
Bogoras, Waldemar
1922 Chukchee. In Franz Boas (ed.), Handbook of American Indian
Languages: Part 2. Washington: Smithsonian.
Booij, Geert
1995 The Phonology of Dutch. Oxford: Oxford University Press.
Bowden, John
2001 Taba: Description of a South Halmahera language. Pacic Linguis-
tics 521. Canberra: Australian National University.
Briggs, Elinor
1961 Mitla Zapotec Grammar. Mexico: Instituto Lingstico de Verano
and Centro de Investigaciones Antropolgicas de Mxico.
Broderick, George
1993 Manx. In Martin Ball and James Fife (eds.), The Celtic Languages,
228288. London: Routledge.
Butskhrikidze, Marika
2002 The Consonant Phonotactics of Georgian. Utrecht: LOT.
Chavarria-Aguilar, O. L.
1951 The phonemes of Costa Rican Spanish. Language, 27(3): 248253.
Chayen, Moshe J.
1972 The accent of Israeli Hebrew. Lenshonenu, 36: 212219, 287300.
Chayen, Moshe J.
1973 The Phonetics of Modern Hebrew. The Hague: Mouton.
Chitoran, Ioana
1998 Georgian harmonic clusters: Phonetic cues to phonological represen-
tation. Phonology, 15(2): 121141.
Chitoran, Ioana
1999 Accounting for sonority violations: The case of Georgian consonant
sequencing. Proceedings of the 14th International Congress of
Phonetic Sciences, 101104. San Francisco, August 1999.
Chitoran, Ioana, Louis Goldstein and Dani Byrd
2002 Gestural overlap and recoverability: Articulatory evidence from
Georgian. In Carlos Gussenhoven and Natasha Warner (eds.),
Laboratory Phonology 7, 419447. Berlin, New York: Mouton de
Gruyter.
Cho, Young-mee Yu and Tracy Holloway King
2003 Semi-syllables and universal syllabication. In Caroline Fry and
Ruben van de Vijver, (eds.), The Syllable in Optimality Theory:
183212. Cambridge: Cambridge University Press.
Clements, Nick G.
1990 The role of the sonority cycle in core syllabication. In John King-
ston and Mary Beckman (eds.), Papers in Laboratory Phonology I:
60 Rina Kreitman
Between the Grammar and Physics of Speech, 282333. Cam-

bridge: Cambridge University Press.
Clements, Nick G. and S. Jay Keyser
1983 CV Phonology: a Generative Theory of the Syllable. Cambridge:
MIT Press.
Clements, Nick G. and Osu Sylvester
2002 Explosives, implosives and nonexplosives: The linguistic function
of air pressure differences in stops. In Carlos Gussenhoven and
Natasha Warner (eds.), Laboratory Phonology 7, 299350. Berlin,
New York: Mouton de Gruyter.
Coates, William A. and da Silva, M.W.S.
1960 The segmental phonemes of Sinhalese. University of Ceylon Review,
18: 163175.
Cohen, Antonie., C.L. Ebeling, K. Fokkema and A.G.F. van Holk
1961 Fonologie van het Nederlands en het Fries. Inleiding tot de
Moderne Klankleer, Gravenhage: Martinus Nijhoff.
Colarusso, John
1992 A Grammar of the Kabardian Language. Calgary: University of
Calgary Press.
Davies, John H.
1980 Kobon phonology. Pacic Linguistics B, 68. Canberra: Australian
National University.
Davis, Stuart
1985 Topics in syllable geometry. Ph.D. dissertation, Department of
Linguistics, University of Arizona.
Dean, James and Gladys Dean
1955 The phonemes of Bilaan. Philippine Journal of Science, 84(3): 311
322.
Dell, Franois
1995 Consonant clusters and phonological syllables in French. Lingua,
95: 526.
Dell, Franois and Mohamed Elmedlaoui
2002 Syllables in Tashlhiyt Berber and in Moroccan Arabic. Dordrecht:
Kluwer.
Derbyshire, Desmond
1985 Hixkaryana and Linguistic Typology. Dallas: Summer Institute of
Linguistics and The University of Texas at Arlington.
Diderichsen, Paul
1964 Essentials of Danish Grammar. Copenhagen: Akademisk Forlag.
Dirks, Sylvester
1953 Campa (Arawak) phonemes. International Journal of American
Linguistics, 19: 302304.
Dochartaigh, Cathair .
1992 The Irish language. In Donald Macaulay (ed.), The Celtic Languages.
Einaudi, Paula
1976 A Grammar of Biloxi. New York: Garland.
Eleftheriades, Olga
1985 Modern Greek: A Contemporary Grammar. Palo Alto: Pacic Books
Publishers.
Elson, Ben
1947 Sierra Popoluca syllable structure. International Journal of American
Linguistics, 13(1): 1317.
Engelenhoven, Aone van
1995 A Description of the Leti language (as spoken in Tutukei). Ridderkerk:
Offsetdrukkerij Ridderprint B.V.
Everett, Daniel and Keren Everett
1984 On the Relevance of Syllable Onsets to Stress Placement. Linguistic
Inquiry, 15: 705711.
Fast, Peter W.
1953 Amuesha (Arawak) phonemes. International Journal of American
Ferguson, Charles and Munier Chowdhury
1960 The Phonemes of Bengali. Language, 36(1): 2259.
Fortescue, Michael D.
1984 West Greenlandic. London, Croom Helm.
Garvin, Paul L.
1948 Kutenai I: Phonemics. International Journal of American Linguistics,
14: 3742.
George, Ken
1993 Cornish. In Martin Ball and James Fife (eds.), The Celtic Languages,
Gillies, William
1993 Scottish Gaelic. In Martin Ball and James Fife (eds.), The Celtic
Languages, 145227. London: Routledge.
Goedemans, Rob
1998 Weightless segments. The Hague: Holland Academic Graphics.
Gordon, Matthew
1999 Syllable weight: Phonetics, phonology, and typology. Ph.D. disserta-
tion, Department of Linguistics, University of California, Los
Angeles.
Gralow, Frances L.
1976 Fonologa del Cham [Chami Phonology]. Sistemas Fonolgicos de
Idiomas Colombianos 3, 2942. Bogot: Ministerio de Gobierno
and Instituto Lingstico de Verano.
Green, Anthony
1997 The prosodic structure of Irish, Scots Gaelic, and Manx. Ph.D. dis-
sertation, Department of Linguistics, Cornell University.
Greenberg, Joseph
1965 Some generalizations concerning initial and nal Consonant sequences.
Linguistics, 18: 534. (reprinted as Greenberg 1978).
62 Rina Kreitman
Greenberg, Joseph
1978 Some generalizations concerning initial and nal consonant clusters.
In Joseph H. Greenberg (ed.), Universals of Human Language, vol.
2: Phonology. Stanford, California: Stanford University Press.
Gumperz, John
1958 Phonological differences in three Hindi dialects. Language, 34:
212224.
Gussmann, Edmund
1992 Resyllabication and delinking: the case of Polish voicing. Linguistic
Inquiry, 23, 2556.
Gvarjaladze Tamar and Isidor Gvarjaladze
1974 English-Georgian Dictionary. Tbilisi: State Publication House.
Haiman, John
1980 Hua, A Papuan Language of the Eastern Highlands of New Guinea.
Amsterdam: John Benjamins.
Hajek, John and John Bowden
1999 Taba and Roma: clusters and geminates in two Austronesian lan-
guages. In Proceedings of the XIVth Congress of Phonetic Sciences:
San Francisco, 17 August, 10331036. American Institute of
Physics.
Halpern, Abraham Meyer
1946 Yuma I: Phonemics. International Journal of American Linguistics,
12(1): 2533.
Hansen, Aage
1967 Moderne Dansk [Modern Danish]. Kbenhavn: Forlag Harley. (In
Danish)
Henderson, Eugnie
1991 Khasi clusters and Greenbergs universals. Mon-Khmer Studies, 18
19: 616.
Hoard, James E.
1978 Remarks on the nature of syllabic stops and affricates. In Alan Bell
and Joan Hooper (eds.), Syllables and Segments. Amsterdam: North-
Holland.
Hodge, Carleton T.
1946 Serbo-Croatian phonemes. Language, 22: 112120.
Hoff, Bernard J.
1968 The Carib Language: Phonology, Morphonology, Morphology, Texts
and Word Index. The Hague: Martinus Nijhoff.
Howard, Linda
1967 Camsa phonology. In Viola G. Waterhouse (ed.), Phonemic Systems
of Colombian Languages, 7387. Summer Institute of Linguistics
Publications in Linguistics and Related Fields, 14. Norman: Summer
Institute of Linguistics of the University of Oklahoma.
Hsin, Tien-Hsin
2000 Consonant clusters in Tsou and their theoretical implications. The
Proceedings of the 18th West Coast Conference on Formal Linguis-
tics, Cascadilla Press.
Hualde, Jos Ignacio
1991 Basque Phonology. London, New York: Routledge.
Huffman, Franklin E.
1990 Burmese, Thai Mon, and Nyah Kur: A synchronic comparison.
Mon-Khmer Studies, 1617: 3164.
Hyman, Larry
1985 A Theory of Phonological Weight. Dordrecht: Foris.
It, Junko
1989 A prosodic theory of epenthesis. Natural Language and Linguistic
Theory, 7: 217259.
Iverson, Gregory and Joseph Salmons
1995 Aspiration and laryngeal representation in Germanic. Phonology, 12:
369396.
Jacobs, Neil G.
2005 Yiddish: A Linguistic Introduction. Cambridge: Cambridge University
Press.
Jessen, Michael
2001 Phonetic implementation of the distinctive auditory features [voice]
and [tense] in stop consonants. In Tracy Alan Hall (ed.), Distinctive
Feature Theory, 237294. Berlin, New York: Mouton de Gruyter.
Jessen, Michael and Catherine O. Ringen
2002 Laryngeal features in German. Phonology, 19: 189218.
Joseph, Brian. D. and Irene Philippaki-Warburton
1987 Modern Greek. London: Croom Helm.
Kahn, Daniel
1976 Syllable-based generalizations in English phonology. Ph.D. disserta-
tion, Department of Linguistics Massachusetts Institute of Technol-
ogy. [Published 1980 New York: Garland Press.]
Keating, Patricia A.
1984 Phonetic and phonological representation of stop consonant voicing.
Language, 60: 286319.
Kenstowicz, Michael
1981 The phonology of Chukchee consonants. In Bernard Comrie (ed.),
Studies in the Languages of the USSR. Carbondale: Linguistic
Research Inc.
Kreitman, Rina
2003 Diminutive reduplication in Modern Hebrew. Working Papers of the
Cornell Phonetics Laboratory, 15: 101129.
Kreitman, Rina
2006 Cluster buster: A typology of onset clusters. In J. Bunting, S. Desai,
R. Peachy, C. Straughn and Z. Tomkov (eds.), Chicago Linguistic
Society, 42(1): 163179.
64 Rina Kreitman
Kreitman, Rina
2008 The phonetics and phonology of onset clusters: The case of Modern
Hebrew. Ph.D. dissertation, Department of Linguistics, Cornell
University.
Kreitman, Rina
2010 Mixed voicing word-initial onset clusters. In Ccile Fougeron,
Barbara Khnert, Mariapaola DImperio and Natalie Valle (eds.),
Laboratory Phonology 10: Phonology and Phonetics, 169200.
Berlin: Mouton de Gruyter.
Kuera, Henry
1961 The Phonology of Czech. The Hague: Mouton and Company.
Kuera, Henry and George Monroe
1968 A Comparative Quantitative Phonology of Russian, Czech and
German. New York: American Elsevier Publication.
Ladefoged, Peter and Ian Maddieson
1996 The Sounds of the Worlds Languages. Oxford: Blackwell.
Laufer, Asher
1994 Voicing in contemporary Hebrew. Leshonenu, 57(4): 299342. (in
Hebrew).
Levi, Susannah V.
2004 The representation of underlying glides. Ph.D. dissertation, Depart-
ment of Linguistics, University of Washington.
Levi, Susannah V.
2008 Phonemic vs. derived glides. Lingua, 118: 19561978.
Levin, Juliette
1985 A metrical theory of syllabicity. Ph.D. dissertation, Department of
Linguistics, Massachusetts Institute of Technology.
Levinsohn, Stephen H.
1979 Fonologa del Inga [Phonology of Inga]. In Marilyn E. Cathcart et
al. (eds.), Sistemas Fonolgicos de Idiomas Colombianos 4, 6585.
Bogota: Ministerio de Gobierno. (In Spanish)
Lindblom, Bjrn
1983 Economy of speech gestures. In Peter MacNeilage (ed.), Speech
Production, 217246. New York: Springer-Verlag.
Lindblom, Bjrn and Ian Maddieson
1988 Phonetic universals in consonant systems In Larry M. Hyman and
Charles N. Li (eds.), Language, Speech and Mind, 6278. New
York: Routledge.
Lombardi, Linda
1991 Laryngeal features and laryngeal neutralization. Ph.D. dissertation,
Department of Linguistics, University of Massachusetts, Amherst.
Lombardi, Linda
1995a Laryngeal features and privativity. The Linguistic Review, 12: 35
59.
Lombardi, Linda
1995b Laryngeal neutralization and syllable wellformedness. Natural Lan-
guage and Linguist Theory, 13: 3974.
Lombardi, Linda
1999 Positional faithfulness and voicing assimilation in Optimality
Theory. Natural Language and Linguist Theory, 17: 267302.
MacKay, Carolyn J.
1994 A sketch of Misantla Totonac phonology. International Journal of
American Linguistics, 60(4): 369419.
MacKay, Carolyn J.
1999 A Grammar of Misantla Totonac. Salt Lake City: The University of
Utah Press.
Maddieson, Ian
1981 Unusual consonant clusters and complex segments in Eggon. Studies
in African Linguistics, Supplement 8: 8992.
Maddieson, Ian and Peter Ladefoged
1993 Phonetics of partially nasal consonants. In Marie K. Huffman and
Rena Krakow (eds.), Nasals, Nasalization and the Velum, 251301.
San Diego: Academic Press.
Mallinson, Graham
1986 Rumanian. London: Croom Helm.
Marlett, Stephen A.
1988 The syllable structure of Seri. International Journal of American
Marlett, Stephen A. and Velma B. Pickett
1987 The syllable structure and aspect morphology of Isthmus Zapotec.
International Journal of American Linguistics, 53: 398422.
Matthews, Hubert
1955 A phonemic analysis of a Dakota dialect. International Journal of
American Linguistics, 21: 5659.
McArthur, Harry S. and Lucille E. McArthur
1956 Aguacatec (Mayan) phonemes within the stress group. International
Journal of American Linguistics, 22: 7276.
McCarthy, John and Alan Prince
1986 Prosodic morphology. Ms., University of Massachusetts, Amherst,
and Brandeis University, Waltham, Mass.
McKaughan, Howard. P.
1954 Chatino formulas and phonemes. International Journal of American
Morelli, Frida
1998 Markedness relations and implicational universals in the typology of
onset obstruent clusters. In Proceedings of NELS 28: Volume 2.
Morelli, Frida
1999 The phonotactics and phonology of obstruent clusters in Optimality
Theory. Ph.D. dissertation, Department of Linguistics, University of
Maryland at College Park.
66 Rina Kreitman
Morelli, Frida
2003 The relative harmony of /s+stop/ onsets: Obstruent clusters and the
sonority sequencing principle. In Caroline Fry and Ruben van de
Vijver (eds.), The Syllable in Optimality Theory, 356371. Cam-
Mortensen, Charles A.
1999 A Reference Grammar of the Northern Embera Languages: Studies
in the Languages of Colombia 7. Arlington: Summer Institute of
Linguistics and the University of Texas, Publications in Linguistics,
118. Dallas: Summer Institute of Linguistics and the University of
Texas at Arlington.
Nacaskul, Karnchana
1978 The syllabic and morphological structure of Cambodian words.
Mon-Khmer Studies, 7: 183200.
Nagaraja, Keralapura S.
1990 Khasi Phonetic Reader. Mysore: Central Institute of Indian Languages.
Ns, Olav
1965 Norsk Grammatikk: Elementre Strukturer og Syntaks [Norwegian
Grammar: Elementary Structures and Syntax]. Forlag: Fabritius &
Snners. (In Norwegian).
Needham, Doris and Marjorie Davis
1946 Cuicateco Phonology. International Journal of American Linguistics,
12: 139146.
Nepveu, Denis
1994 Georgian and Bella Coola: Headless syllables and syllabic obstruents.
MA thesis, UC Santa Cruz.
Ohala, Manjari
1983 Aspects of Hindi Phonology. Delhi: Motilal Baarsidass.
Okrand, Marc
1979 Metathesis in Costanoan grammar. International Journal of American
Parker, Steve
2002 Quantifying the sonority hierarchy. Ph.D. dissertation, Department
of Linguistics, University of Massachusetts, Amherst.
Parker, Steve
2008 Sound level protrusions as physical correlates of sonority. Journal of
Phonetics, 36: 5590.
Penzl, Herbert
1955 A Grammar of Pashto: A Descriptive Study of the Dialect of Kanda-
har, Afghanistan. Washington, D.C.: American Council of Learned
Societies.
Pike, Kenneth and Eunice Pike
1947 Immediate constituents of Mazateco syllables. International Journal
of American Linguistics, 13(2): 7891.
Rabel, Lili
1961 Khasi, a Language of Assam. Baton Rouge: Louisiana State Univer-
sity Press.
Rex, Eileen and Mareike Schttelndreyer
1973 Sistema fonolgico del Cato [Phonological systems of Catio].
Sistemas Fonolgicos de Idiomas Colombianos 2, 7385. Bogot:
Ministerio de Gobierno. (In Spanish).
Rialland, Annie
1994 The phonology and phonetics of extrasyllabicity in French. In Patricia
Keating (ed.), Phonological Structure and Phonetic Form: Papers in
Laboratory Phonology 3, 136159. Cambridge: Cambridge Univer-
sity Press.
Riehl, Anastasia
2008 The phonology and phonetics of Nasal-Obstruent sequences. Ph.D.
dissertation, Department of Linguistics, Cornell University.
Riggs, Venda
1949 Alternate phonemic analysis of Comanche. International Journal of
American Linguistics, 15: 229231.
Rgnvaldsson, Eirkur
1993 slensk Hljokersfri [Icelandic Phonology]. Reykjavk: Mlv-
sindastofnun Hskla slands. (In Icelandic).
Rusanivskyi, V. M. (ed.)
1986 Ukrainskaya Grammatika [Ukrainian Grammar]. Kiev: Naukova
dumka. (In Russian).
Sapir, Edward
1923 The Phonetics of Haida. International Journal of American Linguis-
tics, 2(3/4): 143158.
Sawicka, Irena
1974 Struktura Grup Spgoskowych w Jezykach Sowiaskich [Structure
of Consonantal Clusters in Slavic Languages]. Wrocaw: Zakad
Narodowy im Ossoliskich. (In Polish).
Scatton, Ernest. A.
1984 A Reference Grammar of Modern Bulgarian. Cambridge: Slavica
Publishers.
Schaub, Willi
1985 Babungo. London: Croom Helm.
Selkirk, Elisabeth
1982 The syllable. In Harry van der Hulst and Norval Smith (eds.), The
structure of phonological representations. Dordrecht: Foris Publica-
tions.
Selkirk, Elizabeth
1984 On the major class features and syllable theory. In Morris Halle,
Mark Aronoff and Richard T. Oehrle (eds.), Language Sound Struc-
ture: Studies in Phonology, 107136. Cambridge, Massachusetts:
MIT Press.
68 Rina Kreitman
Shaw, Jason, Adamantios Gafos, Philip Hoole and Chakir Zeroual

2009 Syllabication in Moroccan Arabic evidence from patterns of
temporal stability in articulation. Phonology, 26: 187215.
Sigurd, Bengt
1965 Phonotactic Structure in Swedish. Lund: Berlingska Boktryckeriet.
Siptr, Peter and Miklos Trkenczy
2000 The Phonology of Hungarian. Oxford: Oxford University Press.
Skorik, Petr. J.
1961 Grammatika ukotskogo Jazyka: Fonetika i Morfologija Imennyx
astej Rei [Grammar of the Chukchee Language: Nominal Parts
of Speech]. Leningrad: Nauka. (In Russian).
Spencer, Robert F.
1946 The Phonemes of Keresan. International Journal of American Lin-
guistics, 12(4): 229236.
Steriade, Donca
1982 Greek prosodies and the nature of syllabication. Ph.D. dissertation,
Department of Linguistics, Massachusetts Institute of Technology.
Steriade, Donca
1994 Complex onsets as single segments: the Mazateco pattern. In Jennifer
Cole and Charles Kisseberth (eds.), Perspectives in Phonology,
203293. Stanford, California: CSLI Publications.
Steriade, Donca
1997 Phonetics in phonology: The case of laryngeal neutralization. Ms.,
University of California Los Angeles.
Steven, Lee Anthony
1991 The phonology of Roma: An Austronesian language of Eastern
Indonesia. MA thesis. University of Texas.
Sun, Hongkai
1986 Notes on Tibeto-Burman consonant clusters. Linguistics of the
Tibeto-Burman Area, 9(1): 121.
Ternes, Elmar
1992 The Breton language. In Donald MacAulay (ed.), The Celtic Lan-
guages, 371452. Cambridge: Cambridge University Press.
Thomas, Alan R.
1992 The Welsh Language. In Donald MacAulay (ed.), The Celtic Lan-
guages, 251345. Cambridge: Cambridge University Press.
Thomas, David
1992 On sesquisyllabic structure. Mon-Khmer Studies, 21: 206210.
Trubetskoy, Nikolai
1939 Grundzge der Phonologie [Grounded Phonology] [Translated by
C. Baltaxe (1969)]. Berkley: University of California Press.
Vaux, Bert
1998 The Phonology of Armenian. Oxford: Oxford University Press.
Vaux, Bert
2004 The appendix. Talk presented at CUNY. The CUNY Phonology
Forum.
Ventzel, Tatiana V.
1983 The Gypsy Language. Moscow: Nauka Publication.
Watkins, Justin
2002 The Phonetics of Wa. Canberra: Pacic Linguistics.
Werner, Winter
1962 Problems of Armenian phonology III. Language, 38(3): 254262.
Westbury, John and Patricia Keating
1986 On the naturalness of stop consonant voicing. Journal of Linguistics.
22: 145166.
Wetzels, W. Leo and Joan Mascar
2001 The typology of voicing and devoicing. Language, 77(2): 207244.
Wheeler, Max
2005 Voicing contrast: licensed by prosody or licensed by cue? ROA
769, Rutgers Optimality Archive, http://roa.rutgers.edu/.
Wiese, Richard
1996 The Phonology of German. Oxford: Calderon Press.
Wonderly, William L.
1951 Zoque II: Phonemes and morphophonemes. International Journal of
American Linguistics, 17(2): 105123.
Wright, Richard
1996 Consonant clusters and cue preservation in Tsou. Ph.D. dissertation,
Department of Linguistics, University of California Los Angeles.
Yoshioka, Hirohide, Anders Lfqvist and Ren Collier
1982 Laryngeal adjustments in Dutch voiceless obstruent production.
Annual Bulletin of the Research Institute of Logopedics and Pho-
niatrics, 16: 2735.
Zec, Draga
1988 Sonority constraints on prosodic structure. Ph.D. dissertation,
Department of Linguistics, Stanford University.
Zec, Draga
1995 Sonority constraints on syllable structure. Phonology, 12: 85129.
Limited consonant clusters in OV languages
Hisao Tokizaki and Yasutomo Kuwana
Abstract
It has been claimed that the complexity of syllable structure is correlated to the order
between verb and object in languages of the world: the syllable structure in OV languages
is simpler than that in VO languages. However, our analysis of data in Maddieson (2005)
and Dryer (2005) seems to show that a number of OV languages have (moderately)
complex syllable structure. In spite of this result, we argue that the syllable structure
in OV languages is simpler than has been reported, by considering the geographical
gradience of coda variety, coda inventory, phonological simplication and particles
attached to nouns, and complement-head orders other than OV/VO. We also discuss
why OV languages have simple syllable structure: it is argued that juncture between
constituents is stronger in left-branching structure (OV) than in right-branching struc-
ture (VO); strong juncture in left-branching structure makes words closely connected
to each other; simple syllable structure such as CV ts nicely into the stronger juncture
without making a consonant cluster.
1. Introduction
It has been pointed out that languages with object-verb order (OV) tend to
have simple syllable structure (Lehmann 1973, Gil 1986, Plank 1998). This
is the case in some OV languages such as Ijo, Yareba and Warao, whose syl-
lable form is CV. However, examination of data in Haspelmath et al. (2005)
(henceforth WALS) shows that a number of OV languages have (moderately)
complex syllable structure.
In this paper, we argue that the syllable structure in OV languages is sim-
pler than has been reported, by showing that consonant clusters are limited at
word boundaries and between words in OV languages. We base our argument
on only a small number of example languages but hope that these will be suf-
cient to demonstrate the viability of our research proposal. From a conceptual
and theoretical point of view, we also discuss the reason why OV languages
should have simple syllable structure.
In Section 2, we review the previous studies of the correlation between syl-
lable complexity and word order. We also examine the correlation hypothesis
using data from WALS. In Section 3, we argue that syllable structure in OV
languages is simpler than it looks if we consider geographical gradation, sim-
plication processes and limited coda inventory. Section 4 discusses why OV
72 Hisao Tokizaki and Yasutomo Kuwana
languages have simple syllable structure; we argue that juncture between con-
stituents is stronger in left-branching structure (OV) than in right-branching
structure (VO). Section 5 concludes the discussion.
2. The correlation between syllable structure and OV order
2.1. Implicational universals

There have been a number of studies that try to show the correlation between
phonology and syntax; Plank (1998) presents an overview of these. Here we
concentrate on the relation between syllable structure and verb-object order.
It has been pointed out that languages with object-verb order (OV) tend to
have simple syllable structure (Lehmann 1973, Donegan and Stampe 1983,
Gil 1986, Plank 1998). The Universals Archive lists two correlations, no. 196
and no. 207, with comments by Frans Plank as shown in (1).1
(1) a. OV languages tend to have simple syllable structure.
b. IF basic order is OV, THEN syllable structure is simple (tending
towards CV).
c. Counterexamples:
d. Comments: Languages with exive morphology (which tend to be
OV) tend to have the ends of syllables closed, with consonant
clusters occurring in this position as freely as in initial position
(Lehmann 1973: 61).
This implicational relation is the case in some OV languages, such as Ijo
(Niger-Congo), Yareba (Papua New Guinea) and Warao (Venezuela), whose
syllable form is CV.
The Universals Archive also shows another correlation between word order
and syllable structure, as shown in (2).
(2) a. VO languages tend to have complex syllable structure.
b. IF basic order is VO, THEN syllable structure is complex
(permitting initial and nal consonant clusters).
c. Counterexamples: Old Egyptian (Afro-Asiatic): VO, only syllable
types CV and CVC (F. Kammerzell, p.c.).
d. Comments:
1. http://typo.uni-konstanz.de/archive/intro/index.php (Accessed on August 25, 2009)

Limited consonant clusters in OV languages 73
The two observations in (1) and (2) predict that there will be considerable differ-
ences between SOV and SVO languages with respect to syllable complexity.
Gil (1986) tests the correlation between OV/VO order and syllable struc-
ture with his 170 sample languages. He reports that the average number of
segments in the syllable structure templates: SOV 4.04 < SVO 4.93. How-
ever, this result is not very convincing because the difference between SOV
and SVO is less than 0.9 (0.89). Moreover, the number of sample languages
is not large enough to claim (1) and (2) as universals across languages; it is
necessary, therefore, to test the hypothesis with more data.
2.2. Testing the correlation with data from WALS

Let us try to show the correlation between OV/VO order using data from
WALS, which lists 2,561 languages, including 359 languages with data on
both syllable structure and OV/VO order.
Maddieson (2005) in WALS (chapter 12) divides languages into three cate-
gories according to their syllable structure: simple, moderately complex and
complex, as shown in (3).
(3) a. Simple
CV Hawaiian and Mba (Adamawa-Ubangian,
Niger-Congo; Democratic Republic of Congo)
(C)V Fijian, Igbo (Niger-Congo; Nigeria), and
Yareba (Yareban; Papua New Guinea)
b. Moderately complex
CVC
CC2V C2 = liquids (r/l) or glides (w/j)
CC2VC C2 = w in Darai (Indo-Aryan; Nepal)
c. Complex
(C)(C)(C)V(C)(C)(C)(C) English
Categorizing syllable complexity into three groups is effective in showing
typological differences between languages. However, we will argue that, as
Plank (2009) points out, the categorization is not ne enough to enable corre-
lations between syllable complexity and other features to be identied.2
2. Maddieson (2009) admits the crudity of this three-way distinction of syllable com-
plexity, and proposes a renement of syllable typology by scoring the complexity
of onset, nucleus and coda, as shown in (i)(iii).
Dryer (2005) in WALS (Chapter 83) distinguishes three types of languages

with respect to the order of object and verb: OV, VO and no dominant order.
The third type, languages in which neither OV nor VO is dominant, falls into
two classes. The rst class is of languages with exible word order, where
both orders are common and the choice is determined by extragrammatical
factors, such as many Australian languages (e.g. Ngandi (Gunwinyguan;
Northern Territory, Australia)). In the second class are languages in which
word order is primarily determined syntactically, but in which there are com-
peting OV and VO constructions. This class includes German in which VO
order is used in main clauses in which there is no auxiliary verb, while OV
order is used in clauses with an auxiliary verb and in subordinate clauses intro-
duced by a subordinator.
Combining Maddiesons and Dryers classication of languages by syllable
structure (#12) and word order (#83) in WALS Online (http://wals.info/index)
gives us the results shown in Table 1 below. The average complexity of sylla-
ble structure in each word order is calculated with simple = 1, moderately
complex = 2 and complex = 3. For example, the average syllable complexity
of languages with OV order is 2.25 = (18 1 + 93 2 + 60 3) 171.
2. ii(i) Contribution of Onset:

0 = Maximal onset is single C
1 = Maximal onset is C + liquid, glide (or nasal)
2 = Maximal onset is CC where C2 may be an obstruent
3 = Maximal onset is CCC or longer
2. i(ii) Contribution of Nucleus:
1 = Nucleus is only simple (monomoraic) V
2 = Nucleus may be long vowel or diphthong
2. (iii) Contribution of Coda:
0 = No codas allowed
1 = Maximal coda is single C
2 = Maximal coda is CC
3 = Maximal coda is CCC
2. The rened syllable typology has eight steps on a scale (18). Maddieson claims
that distribution of languages across categories is approximately normal with N =
605 languages. According to this typology, simple languages (maximal syllable
CV) = 1, Japanese = 3 (maximal syllable CjVVC) and Dutch/English = 8. We expect
to be able to see the correlation between syllable structure and word orders if
we use this typological data on syllable complexity; however, these data are not
available at present.
Table 1. Syllable comlexities and object-verb order: number of languages
Total 359 languages Order of Object and Verb

Syllable Structure OV (171) VO (165) No domin. order (23)
Simple (44) 18 23 3
Moderately complex (198) 93 95 10
Complex (117) 60 47 10
Complexity Average 2.25 2.15 2.30
These results do not seem to show the expected correlation between the
object-verb order and the syllable structure, that we have seen in (1) (i.e.
OV ! simple syllable) and (2) (i.e. VO ! complex syllable) above. Even
worse, the 23 languages with simple syllable structure and VO orders out-
number the 18 languages with simple syllable structure and OV order. The
60 languages with complex syllable structure and OV order outnumber the 47
languages with complex syllable structure and VO order. These data are in fact
the opposite of what we expected, given the previous studies we have seen
above. It may be that the results can be improved by rening our quantitative
approach.
First, Dryer (1992, 2009) argues that typological work should not be based
on the number of languages, but on the number of genera. Genera are groups
of languages whose similarity is such that their genetic relatedness is un-
controversial (Dryer 1992: 84). Dryer argues that counting genera rather than
languages controls for the most severe genetic bias. Counting the numbers of
genera instead of languages slightly improves the results, as shown in Table 2.
Table 2. Syllable complexities and object-verb order: number of genera
Total 272 genera Order of Object and Verb

Simple (36) 17 16 3
Complex (96) 48 38 10
Complexity Average 2.23 2.19 2.30
The 17 genera with simple syllable structure and OV order outnumber the 16
genera with simple syllable structure and VO order. However, the 48 genera
with complex syllable structure and OV order still outnumber the 38 genera
with complex syllable structure and VO order.
Second, Dryer (1992, 2009) argues that genera should also be divided into
six macro areas. He emphasizes that it is dangerous to use data from raw totals
of languages without examining their distribution over areas. Dividing genera
into macro areas gives Table 3.
Table 3. Syllable complexities and object-verb order: number of genera in six

macro areas
a. Africa (58) Order of Object and Verb

Simple (10) 2 7 1
Complex (11) 2 8 1
b. Eurasia (45) Order of Object and Verb

Simple (0)
Moderately complex (15) 13 2
Complex (30) 20 8 2
c. South East Asia (39) Order of Object and Verb

Simple (3) 3
Moderately complex (25) 6 19
Complex (11) 3 8
d. Australia (45) Order of Object and Verb

Simple (8) 7 1
Complex (12) 9 1 2
e. North America (46) Order of Object and Verb

Simple (2) 2
Complex (23) 6 12 5
f. South America (40) Order of Object and Verb

Simple (13) 8 3 2
Complex (8) 7 1
Table 3 shows that there are more OV genera than VO genera with simple
syllable structure in (d) Australia (7:1) and (f ) South America (8:3). However,
these areas also have more OV genera than VO genera with complex syllable
structure, i.e. (d) Australia (9:1) and (f ) South America (7:1). In the other
areas, (a) Africa, (b) Eurasia, (c) South East Asia and (e) North America, the
number of OV genera with simple syllable structure is not more than that of
VO genera with simple syllable structure. These results show that the data in
WALS do not give straightforward support for the hypothesis that OV languages
have simple syllable structure.
However, in the next section we argue that OV languages do have simple
syllable structure if we consider the geographical gradation of the variety of
word-nal consonants, the ne classication of syllable complexity and head-
complement orders, the coda inventory and the simplication of syllable
structure within words and between words.
3. Reconsidering syllable structure in OV languages
3.1. Geographical gradation of coda inventory

First, as we saw in Section 2, Maddieson (2005) in WALS denes CV as simple
syllable structure, (C)CVC [Onset CC limited] as moderately complex and
others such as CCVC [CC free], CCCV . . . and . . . VCC as complex. How-
ever, this three-way distinction of syllable structure is not ne enough to
enable us to see possible correlations with other features such as word orders.
For example, syllable complexity should be dened on the basis of the number
and variety of coda consonants. Hashimoto (1978) argues that both coda and
tone are simpler in north Asia than in south Asia, as shown in Table 4.
Table 4. Number of tones and codas in Asian languages (cf. Hashimoto 1978)
Manchu Gansu Beijing Nanshang Guanzhou Thai

# tones 0 3 4 6 8 (9) 8
coda n/ n/ n/ n//t/k m/n//p/t/k m/n//p/t/k
direction North ! South
Southern languages have a wider variety of coda consonants than northern

languages. Thai, a VO language, has the most complex syllable, and Manchu,
an OV language, has the simplest syllable among these languages. However,
both of them are classied as moderately complex in WALS. Japanese,
another OV language, is classied as having a moderately complex syllable.
However, its syllable is (C)V(n), which is quite close to the simple syllable
structure (C)V.
Interestingly, this geographical gradation of the coda inventory correlates
with the variety of head-complement orders in these languages. The northern
language Manchu has consistent head-nal order in words and constituents of
a variety of sizes: Stem-Sufx, Genitive-Noun, Adjective-Noun, Noun Phrase-
Postposition, Object-Verb, Clause-Adverbial Subordinator (complement under-
scored). We dene head as a non-branching category and complement as
a (potentially) branching category. Head-complement order, shown with (),
increases as we move south, and as the coda inventory and number of tones
increase, as shown in Table 5.3
This table shows that the distinction between OV/VO languages is not suf-
cient to explain the correlation between head-complement orders and syllable
complexity. We will try to show such ne correlation in the next section.4
3. We consider interesting the geographical gradation of coda inventory and head-

complement orders because it might show us a relation between linguistics and
anthropology. However, this topic is far beyond the scope of this paper and we
leave the matter open.
4. It is an open question whether a similar geographical gradation of coda inventory
can be found in languages other than Chinese dialects. Unfortunately, we do not
have sufcient data about coda inventory in the worlds languages to identify
such cases. We leave the problem for future research.
Table 5. Number of tones, coda variety and complement-head orders (+) (Stem-Sufx,
Genitive-Noun, Adjective-Noun, Noun Phrase-Postposition, Object-Verb,
Clause-Adverbial Subordinator)
Language #tones coda St-Suf G-N A-N N-P O-V Cl-Sb

Manchu 0 n/ + + + + +
Gansu 3 n/
Beijing 4 n/ + +
Nanshang 6 n//t/k
Guanzhou 8 (9) m/n//p/t/k + + +
Thai 8 m/n//p/t/k
3.2. Number of segments and degree of head-complement order

In order to decide the degree of head-complement/complement-head order
of a language, we checked the languages reported in Gil (1986) with the
six head-complement orders in Table 5, which correspond to the features in
WALS shown in (4).
(4) a. Prexing vs. sufxing in inectional morphology (#26)
b. Order of object and verb (#83)
c. Order of adposition and noun phrases (#85)
d. Order of genitive and noun (#86)
e. Order of adjective and noun (#87)
f. Order of adverbial subordinator and clause (#93)
We assign 1 to each feature when it is a head-complement order (i.e. Prex-
Stem, Verb-Object, Preposition-Noun Phrase, Noun-Genitive, Noun-Adjective,
Adverbial Subordinator-Clause) and 1 when it is a complement-head order
(i.e. Stem-Sufx, Object-Verb, Noun Phrase-Postposition, Genitive-Noun,
Adjective-Noun, Clause-Adverbial Subordinator). Then, the total score of a
consistent head-initial language such as Bantoid and Mixtecan is 6; that of
a consistent head-nal language such as Turkic and Semitic is 6; that of a
mixed language such as Baltic or Athapascan is 0 (1 3 plus 1 3).
For syllable complexity, we used the number of segments in 170 languages
listed in Gil (1986), which is based on the Stanford Phonology Archive and the
UCLA Phonological Segment Inventory Database. Following Dryer (1992,
2005), we counted the number of genera rather than languages.
We grouped the genera according to the number of segments in a syllable,

and calculated the average value of the head-complement orders. The result is
shown in Table 6.
Table 6. The score of head-complement values sorted by number of segments

in a syllable
Number of Average of Number of

Segments HC score genera
2 1.33 6
3 1.45 29
4 0.84 57
5 0.92 39
6 0.07 14
7 1.20 5
8 2.50 2
9 2.33 3
Although the data are insufcient in some cases, Table 6 shows a tendency:
as the number of segments increases, the value of head-complement orders
increases. Except for the languages with two, ve and nine segments in a
syllable, which have the head-complement scores of 1.33, 0.92 and 2.33
respectively (italicized), the HC score gradually increases from 1.45 to 2.50.
This result at least shows that we can expect a ne correlation between syllable
complexity and head-complement orders including OV/VO order.
3.3. Limited coda inventory in OV languages

The coda inventory is more limited in OV languages than in VO languages. A
list of OV languages with possible coda consonants is shown in (5).5
(5) a. Japanese: n
b. Kanuri (Saharan): n, m, l,
c. Avar (Avar-Andic-Tsezic): n, m, w, j
5. The coda data for Kanuri, Korean, Tamil and Chukuchi in list (5) are from VanDam
(2004). We also checked the other languages by analyzing the data in Kamei et al.
(19882001).
d. Tamil (Southern Dravian): n, , , m, l, , , r, j

e. Moghol (Mongolic): n, m, r, d
f. Rutul (Lezgic): d, l, s, x
g. Lezgian (Lezgic): m, b, k, l, z, r
h. Chukchi (Chukotko-Kamchatkan): n, l, w, j, t, k
i. Korean: n, , m, l, p, t, k
j. Kurdish (Central) (Iranian): w, n, m, r, k, t, v, ,
This list shows that there is a general order of consonants appearing in the
coda position in OV languages. VanDam (2004) argues that languages tend
to simultaneously prefer a manner hierarchy (nasal > liquid > obstruent >
glide) and a place hierarchy (alveolar > velar > retroex, tap). This tendency
seems to be generally true in languages in (5). We could argue that, at most,
OV languages tend to have nasals, liquids, and some voiceless obstruents as a
coda. In this sense, syllable structure in OV languages is simpler than in VO
languages, which may have a full variety of obstruents and glides.6
Note that Kurdish (Central) in ( j) has a rich variety of coda consonants.
However, this language has head-complement orders in other constituents
than OV order: Stem-Sufx, Noun-Genitive, Noun-Adjective, Preposition-
Noun Phrase, Adverbial Subordinator-Clause (complements underscored).
Thus, Kurdish (Central) is more of a head-complement language than a
complement-head language, even though it has OV order: its value of head-
complement order is 2 (=(1) 2 + 1 4). This example again shows that
we need to check word orders other than OV/VO in order to see the correla-
tion between word orders and syllable structure, as we saw in Section 3.2.
A question to ask is whether the coda inventory in VO languages is not
as limited as in OV languages. As we will argue, our analysis predicts that
syllable structure in OV languages is simple while that in VO languages may
be either complex or simple. In fact, we nd a number of VO languages
or genera with no coda, such as Igbo (Igboid: Niger-Congo). However, these
languages/genera are not counterexamples to our analysis. We will return to
this point in Section 4.
6. One might argue that our selection of languages in this section and the next is arbi-
trary. We admit that we have not checked all languages in a principled manner.
However, the point of our argument is to show that there are at least a number of
OV languages whose syllable structure is simpler than previously reported, and
that this is an area worthy of future investigation.
3.4. Limited consonant clusters within words in OV languages

Now let us consider the consonant clusters between words in languages. OV
languages of (moderately) complex syllable structure may have phonological
changes such as epenthesis and deletion, which simplify syllable structure. We
propose (6) as working hypotheses.
(6) a. Consonant clusters are reduced in OV (head-nal) languages.
b. Consonant clusters are not reduced in VO (head-initial) languages.
For example, consonant clusters may be avoided by epenthesis of vowels,
deletion of consonants and coalescence, as schematized in (7).
(7) Consonant clusters can be reduced by
a. Epenthesis (CC CVC)
b. Deletion (CC C)
c. Coalescense (CC C)
These phonological changes are found in such languages as Hindi and Basque,
which are classied as complex syllable structure in WALS, but should be
called moderately complex.
First, let us look at the case of epenthesis, which can be found in a number
of OV languages, as shown in (8) (cf. Lee and Ramsey (2000) for Korean).7
(8) a. Nambiqara: wakls waklis alligator
b. Persian: droki (Russian) doroke droshky
c. Basque: libru (Latin) liburu book
d. Kannada: magal (Old) magalu (New) daughter
e. Japanese: drink (English) dorinku
f. Korean: text (English) teyksuthu
In these examples, consonant clusters are reduced by epenthesis of a vowel.
Second, we have cases of deletion within words, as shown in the Basque
examples in (9) (Hualde and de Urbina 2003: 63).
7. We selected the languages in (8) from the OV languages with a description of

syllable simplication in Kamei et al. (19882001). The examples in (8) are taken
from: Price (1976) (8a), Rastorgueva (1964) (8b), Hualde and de Urbina (2003) (8c),
Kamei et al. (19882001) (8d), and Lee and Ramsey (2000) (8f ). Price (1976:
346) reports that in Nambiquara a nondistinctive vowel occurs between all com-
binations of consonants that involves a change in the oral place of articulation.
(9) a. gloria (Latin) loria glory

b. ecclesia (Latin) eliza church
A consonant is deleted in the word-initial position in (9a) and in the medial
position in (9b).
Third, Korean used to have consonant clusters at the onset position in the
era of Middle Korean. These clusters CC changed into reinforced consonants
in Modern Korean.8 The examples in (10) show the process of coalescence.
(10) a. str ttal daughter
b. pskur skur kkul honey
Here tt and kk show reinforced consonants (cf. Lee 1975: 152).
On the other hand, VO languages seem to have few examples of deletion of
consonant clusters. Although it is difcult to show that this is universally the
case, there are examples showing that VO languages may delete vowels to
make consonant clusters. For example, consider English names of Japanese
companies in (11), where vowels are deleted to make consonant clusters or
codas.
(11) a. Matsuda Mazda
b. Yasukawa Yaskawa
c. Noritsu Noritz
These examples show that VO languages such as English do not need to
simplify consonant clusters. Note that we are not claiming that every VO
language has the means of making consonant clusters and codas illustrated
here. As we will discuss in Section 4, our analysis predicts that the syllable
structure in OV languages should be simple while that in VO languages can
be either complex or simple.
3.5. Limited consonant clusters between words in OV languages

Finally, we would like to point out that consonant clusters between words (as
well as those within words) are also limited in OV languages. For example,
Korean, which has a number of nouns ending in a coda consonant, in fact
has particles attached to them to show their cases. Korean has two forms of
particles, which are phonologically conditioned, as shown in (12).
8. We would like to thank John Whitman for discussion on Korean phonology.

(12) a. nominative: -i/ka

b. accusative: -ul/lul
c. instrumental: -ulo/lo
d. comitative: -kwa/wa
e. vocative: -a/ya
f. topic: -un/nun
In (12), the rst form of each pair attaches to a word ending with a consonant
and the second form to a word ending with a vowel.9 Thus, particles and the
words they attach to do not make a consonant cluster even if the words end in
a consonant.
Note also that these particles end in a vowel i/a/o or a consonant l/n. Thus,
constituents consisting of a noun phrase and a particle end in a vowel or l/n.
These features of Korean morpho-phonology make Korean more like a
syllable-timed or moraic language with the form CVCV. . . . Then, Korean is
not a real counterexample to the universal tendency for head-nal languages
to have simple syllable structure.
Similar examples can be found in Moghol (Mongolic), which has eight
types of case sufxes (Weiers 2003: 254).
(13) a. genitive: -i/-i
b. accusative: -i/-i
c. dative: -du/-do/-tu [cf. du (preposition)]
d. ablative: -sa/-sah, -asa/-asah [cf. sah (preposition)]
e. instrumental: -ar
f. comitative: -la/-lah
g. vocative: -
In the ablative (13d), consonant stems normally require the presence of an
extra vowel segment, which avoids making a consonant cluster with the pre-
ceding stem. These case sufxes end in vowels or h/r; constituents consisting
of a noun phrase and a particle also end in a vowel or h/r.
Nivkh also has epenthesis in the case of third person singular pronouns
(Shiraishi 2006: 39, 41) as shown in (14) and (15).
9. We need to consider the reason why -kwa instead of -wa is used after a word
ending with a consonant to make a consonant cluster. Another remaining problem
is why the genitive case marker -uy does not have another form with an onset
consonant.
(14) a. -mk my mother

my mother
b. c h-tk your father
your father
(15) a. i-zaj my picture
my picture
b. c hi-zaj your picture
your picture
Pronominal clitics attach to a vowel-initial host in (14) and to a consonant-
initial host in (15) where the vowel i is inserted.
In this section we argued that OV languages do have simple syllable struc-
ture if we consider the geographical gradation of the variety of word-nal con-
sonants, the ne classication of syllable complexity and head-complement
orders, the coda inventory and the simplication of syllable structure within
words and between words. We used examples from a range of languages to
illustrate these points.
4. Why do OV languages have simple syllable structure?
We have argued that OV languages tend to have simple syllable structure with
fewer consonant clusters between words and within words. In this section, we
consider why word orders correlate with syllable structure. Tokizaki (2008)
argues that left-branching structure has stronger juncture between its con-
stituents than right-branching structure. The juncture between B and C in left-
branching (16a) is stronger than the juncture between A and B in right-
branching (16b).
(16) a. [[A B] C]
b. [A [B C]]
In this sense, the juncture is asymmetrical between left-branching and right-
branching structure. Tokizaki (2008) shows phonological and morpho-syntactic
evidence for this junctural asymmetry. Let us review some of the arguments
about Japanese and Korean presented there and discuss some new data from
Dutch and German. First, consider Rendaku (sequential voicing) in Japanese,
which applies to the rst consonant in a word preceded by another word end-
ing with a vowel. For example, the rst consonant in the second word in (17a)
and (17b) is voiced when it is a part of a compound.
(17) a. nise tanuki nise danuki

mock badger mock-badger
b. tanuki shiru tanuki jiru
badger soup badger-soup
The voicing rule also applies to three-word compounds if they have left-
branching structure as in (18a), but it is blocked if they have right-branching
structure as in (18b) (Otsu (1980)).
(18) a. [[nise tanuki ] shiru] nise danuki jiru
mock badger soup mock-badger soup
b. [nise [tanuki shiru]] nise tanuki jiru
mock badger soup mock badger-soup
Let us assume that Rendaku is the process that assimilates a word-initial
consonant to the preceding vowel with respect to the feature [+voice]. Then
Rendaku is blocked when there is a left bracket between a word-nal vowel
and a word-initial consonant as in (18b). Thus Japanese Rendaku is a case of
left/right-branching asymmetry with respect to blocking phonological change.
Another case of left/right-branching asymmetry is n-Insertion in Korean. In
Standard Korean, n is inserted before a stem beginning in i or y when it is pre-
ceded by another stem or prex which ends in a consonant. For example, sk
color and yuli glass may make s nyuli colored glass. This rule can
apply in compounds with left-branching structure while it cannot in com-
pounds with right-branching structure (Han (1994)).
(19) a. [[on chn] yok] on chn nyok
hot spring bathe
bathing in a hot spring
b. [[m ca] ym] m ja nym
cecum bowel re
appendicitis
(20) a. [ky [ ya sik]] ky ya sik/*ky nya sik
light Western food
a light Western meal
b. [my [ yn ki ]] my yn gi/*my nyn gi
fame play skill
excellent performance
A left bracket in a compound blocks n-Insertion as in (19), and a right bracket
does not, as in (20).
The left/right-branching asymmetry is also seen in languages other than

Japanese and Korean. According to Krott et al. (2004), interxation in Dutch
three-word compounds shows the left/right-branching asymmetry. In Dutch,
the occurrence of interx including -s- in tri-constituent compounds matches
the major constituent boundary better in right-branching compounds than in
left-branching compounds. In (21) and (22), the numbers of compounds with
-s- and all interxes are shown in parentheses after the examples.
(21) a. [arbeid-s-[vraag stuk]] (-s- 38; all 60)
employment+question-issue
b. [hoofd [verkeer-s-weg]] (-s- 3; all 11)
main+trafc-road
(22) a. [[ grond wet]-s-aartikel ] (-s- 25; all 39)
ground-law+article, constitution
b. [[scheep-s-bouw] maatschappij ] (-s- 13; all 50)
ship-building+company
The ratio of the unmarked interx position (21a) and (22a) to the marked in-
terx position (21b) and (22b) is higher in right-branching (21) (-s- 38 3 =
12.7; all 60 11 = 5.5) than in left-branching (22) (-s- 25 13 = 1.9; all 39
50 = 0.8). That is, interxes occur at the constituent break more often in right-
branching compounds than in left-branching compounds. This result is ex-
pected if we assume that the juncture between constituents in right-branching
is weaker than that in left-branching structure.
Moreover, Wagner (2005) shows that there is a phrasing asymmetry between
OV and VO orders: OV is pronounced as a prosodic phrase while VO is pro-
nounced as two prosodic phrases. In (23), parentheses show prosodic phrases.
(23) a. (Sie ht) (einen Tngo getanzt)
she has a-Acc tango danced
She has danced a tango.
b. (Sie tnzte) (einen Tngo)
she danced a-Acc tango
She danced a tango.
The OV in (23a) [[einen Tngo] getanzt] is left-branching and is included in a
prosodic phrase. The VO in (23b) [tnzte [einen Tngo]] is right-branching
and is divided into different prosodic phrases.
These arguments support the idea of left/right-branching asymmetry. Now
let us see how the asymmetry sheds light on the relation between word orders
and syllable structure in languages. Let us consider how simple syllable struc-
ture allows an object to move to the left of the verb to make left-branching
structure. For example, a verb phrase tends to have right-branching structure
in a head-initial language (24a), and left-branching structure in a head-nal
language (24b).
(24) a. [VP V [NP .. N ..]]
b. [VP [NP .. N ..] V]
However, if we assume the left/right-branching asymmetry discussed above,
head-nal languages in fact have compound-like verb phrases.
(25) [V [ .. N .. ] V]
The object and the verb in (25), separated only by a weak bracket (represented
by ] ), are more closely connected to each other than the object and the verb in
(24a), which are separated by a strong boundary. Simple syllable structure
such as CV ts nicely into the stronger juncture in (25) without making a
consonant cluster, as in (26).
(26) [V [ .. CV ] CV]
Then VO languages are allowed to have complex syllable structure because
strong boundaries separate the coda of the verb and the onset of the object as
shown in (27).
(27) [VP .. CCCVCC [NP CCCVCC .. ]]
Thus, left/right-branching asymmetry gives us an interesting way to explain a
correlation between syntax and phonology.10
5. Conclusion
We have seen that data in WALS do not show a clear correspondence between
OV languages and simple syllable structure. However, we have argued that
this is partly due to the crude distinction between syllable complexity in
WALS. We have pointed out that we should take into account the geographical
gradience of coda variety, coda inventory, phonological simplication and
10. Mehler et al. (2004) report experimental work showing the correlation between head-
complement order and rhythm, i.e. head-complement = stress-timed vs. complement-
head = mora-timed. Although it is based on data from only fourteen languages, the
result seems to apply to other languages as well.
particles attached to nouns, and complement-head orders other than OV/VO.

These points limit consonant clusters within words and between words in OV
languages. Thus, the correlation between OV order and simple syllable struc-
ture is more realistic than it seems. This correlation is predicted by the notion
that left-branching structure has stronger juncture than right-branching structure.
Needless to say, we need to investigate the points just mentioned more
carefully and thoroughly. We hope that this research is a step toward a typology
of syllable complexity and its relation to other components of grammar.
Acknowledgments
We would like to thank Theo Vennemann for invaluable comments and sugges-
tions. We are also grateful to Bingfu Lu for his comments on Chinese dialects.
This work is supported by Grant-in-Aid for Scientic Research (A20242010,
C18520388) and Sapporo University.
References
Donegan, Patricia J. and David Stampe

1983 Rhythm and the holistic organization of language structure. In: John
F. Richardson, Mitchell Marks and Amy Chukerman (eds.), Papers
from the Parasession on the Interplay of Phonology, Morphology
and Syntax, Chicago: Chicago Linguistics Society, 337353.
Dryer, Matthew S.
1992 The Greenbergian word order correlations. Language 68: 81138.
Dryer, Matthew S.
2005 Order of object and verb. In: Haspelmath et al. (eds.), 338339.
Dryer, Matthew S.
2009 Problems testing typological correlations with the online WALS.
Linguistic Typology 13: 121135.
Gil, David
1986 A prosodic typology of language. Folia Linguistica 20: 165231.
Han, Eunjoo
1994 Prosodic structure in compounds. Doctoral dissertation, Stanford
University.
Hashimoto, Mantaro
1978 Gengo ruikei chiri-ron (Typological and geographical linguistics).
Tokyo: Kobundo. Also in Hashimoto Mantaro Chosaku-shu vol. 1,
Tokyo: Uchiyama-shoten, 29190.
Haspelmath, Martin, Matthew S. Dryer, David Gil, and Bernard Comrie

2005 The world atlas of language structures. Oxford: Oxford University
Press.
Hualde, Jos Ignacio and Jon Ortiz de Urbina
2003 A grammar of Basque. Berlin: Mouton de Gruyter.
Kamei, Takashi, Rokuro Kohno and Eiichi Chino (eds.)
19882001 Gengogaku Daijiten (The Dictionary of Linguistics). Tokyo: Sanseido.
Krott, Andrea, Gary Libben, Gonia Jarema, Wolfgang Dressler, Robert Schreuder and
Harald Baayen
2004 Probability in the grammar of German and Dutch: Interxation in
triconsonstituent compounds. Language and Speech 47, 83106.
Lee, Iksop and S. Robert Ramsey
2000 The Korean Language. Albany: State University of New York Press.
The Japanese edition is published as Kankokugo Gaisetsu. Tokyo:
Taishukan, 2004.
Lee, Ki-Moon
1975 Kankokugo-no Rekishi (History of Korean Language). Supervised
by Shichiro Murayama and translated by Yukio Fujimoto. Tokyo:
Taishukan.
Lehmann, Winfred P.
1973 A structural principle of language and its implications. Language
49: 4766.
Maddieson, Ian
2005 Syllable structure. In: Haspelmath et al. (eds.), 5455.
Maddieson, Ian
2010 Correlating syllable complexity with other measures of phonological
complexity. On-in Kenkyu (Phonological Studies) 13, 105116.
Mehler, Jacques, Nria Sebastin-Galls and Marina Nespor
2004 Biological foundations of language acquisition: evidence from bi-
lingalism. In: Michael S. Gazzaniga, Emilio Bizzi and Ira B. Black
(eds.) The cognitive neurosciences III, Cambridge, MA: Bradford,
MIT Press, 825836.
Otsu, Yukio
1980 Some aspects of rendaku in Japanese and related problems. MIT
Working Papers in Linguistics Vol. 2: Theoretical Issues in Japanese
Linguistics, 207227.
Plank, Frans
1998 The co-variation of phonology with morphology and syntax: A
hopeful history. Linguistic Typology 2: 195230.
Plank, Frans
2009 WALS values evaluated. Linguistic Typology 13, 4175.
Price, David P.
1976 Southern Nambiquara phonology. International Journal of Anthro-
pological Linguistics 42, 338348.
Rastorgueva, Vera S.
1964 A short sketch of the grammar of Persian, (translated by Steven P.
Hill; edited by Herbert H. Paper.) Bloomington: Indiana University.
Shiraishi, Hidetoshi
2006 Topics in Nivkh phonology. Groningen Dissertations in Linguistics
61. University of Groningen.
Tokizaki, Hisao
2008 Symmetry and asymmetry in the syntax-phonology interface. On-in
Kenkyu (Phonological Studies) 11, 123130.
VanDam, Mark
2004 Word nal coda typology. Journal of Universal Language 5: 119
148.
Wagner, Michael
2005 Asymmetries in prosodic domain formation. MIT Working Papers in
Linguistics 49, 329367.
Weiers, Michael
2003 Moghol. In: Juha Janhunen (ed.) The Mongolic languages, London:
Routledge, 248264.
Manner, place and voice interactions in Greek
cluster phonotactics
Marina Tzakosta
Abstract
This paper evaluates cluster formation and cluster well-formedness in Greek on the
basis of three distinct scales, namely the scale of manner of articulation, the scale of
place of articulation and the scale of voicing. The proposal of this paper is that the
classical Sonority Scale (cf. Selkirk 1984, Steriade 1982) and the bi-dimensional model
proposed by Morelli (1999) in which cluster formation is evaluated on the basis of two
distinct scales, i.e. the manner and place scales, are not adequate to account for cluster
formation and cluster well-formedness. According to the present proposal, in addition
to the scales of manner and place, voicing is crucial for cluster well-formedness and
needs to constitute a distinct scale. Voicing actually denes a cluster as an acceptable
tautosyllabic sequence. Well-formedness is driven by the rightward satisfaction of the
scales in combination with the Distance holding among cluster members. Different
degrees of satisfaction of the scales and different distances holding among cluster
members result in different degrees of cluster well-formedness. The theoretical claims
expressed here are tested through Greek dialectal and developmental data but aim at
having cross-linguistic value. The current proposal further contributes to the establish-
ment of principles governing syllabication.
1. Introduction
Cluster formation is primarily investigated in combination to the ways it con-

tributes to syllabic complexity. Consonant clusters add to syllabic weight and
affect stress assignment given that there are languages in which stress selects
its landing site depending on syllabic weight (Ewen and van der Hulst 2001,
Hayes 1995, van der Hulst 1984).1 Cluster formation is considered to be
driven by the Sonority Scale (hereafter SonS) and Sonority Distance (hereafter
SD) (Selkirk 1984, Steriade 1982).
The SonS determines cluster well-formedness in a progressive and right-
ward manner. More specically and, as illustrated in gure 1 below, phonemes
1. In stress-to-weight systems stress adds weight to the syllable that carries it while in
weight-to-stress systems stress falls on heavy syllables.
94 Marina Tzakosta
rise in sonority from left to right; therefore, stops are the least sonorous seg-
ments whereas vowels are the most sonorous ones. The notion of sonority
was rst introduced by Sievers (1901) and further developed by Jespersen
(1904). Jespersen proposes the classication of phonemes in terms of sonority.
Sonority is considered to be a universal principle dependent on phonological
grounds. Moreover, there are acoustic studies which further support its universal
cross-linguistic character (cf. Jany et al. 2007).
Sonority is a gradient notion in the sense that it is comparative; for example,
stops are less sonorous than fricatives and both are less sonorous than vowels.
Moreover, the more sonorous a segment is the more chances it has to occupy
syllabic nuclei positions. On the contrary, the least sonorous a segment is the
more probable it is to be part of a syllabic onset or a syllabic coda. Given the
above, a syllable is a contour schema rising in sonority towards the nucleus
and falling in sonority towards the coda. Rightward satisfaction of the scale
implies that, for example, stops may cluster with any consonant type to their
right on the scale and result in well-formed clusters. However, fricatives can
cluster with all consonant types except for stops which are located to their
left. Therefore, according to the SonS, FAFFR,2 FN, FL clusters are perfectly
acceptable, but FS3 sequences are not.
Figure 1. The classical sonority scale
Fig. 1 offers a generalized version of the SonS, as proposed by Steriade

(1982). However, there are parametrized versions of the SonS which are im-
posed by the phonotactic constraints of each language. Drachman (1989, 1990),
Kappa (1995), and Malikouti-Drachman (1987) have proposed such parame-
trized SonSs for Greek.
2. S stands for stops, F for fricatives, AFFR for affricates, N for nasals, L for laterals
and rhotics, G for glides, V for vowels and C for obstruent consonants, i.e. stops
and fricatives.
3. Morelli (1999) suggests that the systematic occurrence of obstruent clusters must
be explained in sonority-independent terms. She suggests that the sonority scale
should be divided in two distinct scales, one for PoA and one for MoA, along
which generalizations can be made. According to her proposal, FS sequences are
the only well-formed clusters in Greek. /s/ clusters are also unmarked along both
dimensions. However, Greek allows not only for FS clusters but also for SF, FS,
SS and FF sequences.
Manner, place and voice interactions in Greek cluster phonotactics 95
SD, on the other hand, a notion qualitative in nature, determines the degree
of cluster well-formedness (cf. Clements 1988, 1990, 1992). More specically,
cluster members marked by the biggest possible and sonority-rising distance
between them make up the best-formed clusters. Numbers on the SonS signal
the distance among cluster members. Consequently, a SF cluster like /px/ with
a SD (1) is less well-formed compared to SL sequences like /pl/ with SD (4),
though both are well-formed clusters. Therefore, SD presupposes that cluster
well-formedness is marked by different degrees of cluster perfection and
acceptability. Put differently, cluster perfection is signaled by the biggest
possible sonority distance among cluster members the minimal distance
being (1) while cluster acceptability is signaled by, in most cases, (0) dis-
tance among cluster members; (0) distance is attested when cluster members
share the same manner of articulation, place of articulation or voicing.
Gradience in cluster formation is one of the cores of the present study
which will be discussed in detail. It is important to mention that Lass (1984)
has proposed a mirror image of the SonS, namely, the Scale of Consonantal
Strength (hereafter SConS). On the SConS, phonemes are evaluated and inter-
related not with respect to sonority but with respect to their strength. Therefore,
while vowels are the most sonorous and stops are the least sonorous segments
on the SonS, stops are the strongest segments while vowels are the weakest
segments on the SConS.
Claims like the ones discussed above allow us to make certain predictions
regarding cluster realization and, implicitly, cluster perception. More speci-
cally, if the SonS and SD govern cluster perfection, we expect that a perfect
cluster would be perceptually more salient than an acceptable cluster; as a
result, the former would have more chances to remain intact in its surface/
phonetic realization. In other words, we would expect that the SonS and SD
drive clarity of perception which, in turn, facilitates production. Consequently,
CL rather than CC clusters are expected to emerge more frequently not only
cross-linguistically but also in various aspects of a language (i.e. its dialectal
varieties, L1 and L2 data, language disorders). The accuracy of the above
assumptions is reinforced by the fact that multiple repair strategies, such as
epenthesis, deletion or fusion, apply in clusters with small SD, like SF or FN,
whereas clusters with big SD, like SL, are characterized by vowel anaptyxis.
These assumptions have been tested and veried by Greek L1 and L2 experi-
mental and developmental data in Tzakosta (2009) and Tzakosta and Vis
(2009a, 2009b, 2009c).
Although there is solid argumentation regarding the universal as well as
(per language) parametric factors that determine the formation of consonant
clusters at the level of the SonS and SD, little has been said regarding the
internal coherence of consonant clusters and additional factors which drive
96 Marina Tzakosta
cluster acceptability and cluster perfection in different languages or different

aspects of the same language. In this study, we investigate this issue focusing
on Greek and drawing on data from dialectal varieties of Greek. Dialectal data
will be further supported by L1 and L2 data.
Our fundamental claim is that the SonS and SD do not sufce in rendering
a consonant cluster as perfect or acceptable. We argue that clusters fall within
three categories: i) perfect, ii) acceptable and iii) non-acceptable (cf. Tzakosta
2010, Tzakosta and Karra 2011). We propose that perfect, acceptable and non-
acceptable cluster formation depends on and is evaluated in parallel by means
of the satisfaction of three distinct scales of manner, place and voicing which
must be satised in a rightward manner. Cluster perfection and acceptability
are gradient notions due to SD. In other words, /pl/ and // are both perfect
clusters, but /pl/ is better-formed than // because the SD is bigger for /pl/ (4)
than for // (3). In sum, we argue that cluster formation is driven by the paral-
lel satisfaction of multiple scales of manner, place and voicing in combination
to Distance (hereafter D). We claim that what is important in cluster well-
formedness is not SD but rather simple D given that, as it will be shown,
simple D without the presupposition of sonority is of major importance in
cluster acceptability. A crucial advantage for the establishment of this three-
scales model is that scales contribute, except for well-formed cluster forma-
tion, to the establishment of principles which drive syllabication. This issue
will further be discussed in the remainder of the paper.
The paper proceeds as follows: Section 2 presents the characteristics of
cluster formation in Greek as well as the shortcomings of the current analyses.
Section 3 recapitulates our research questions and working hypotheses while
section 4 provides details regarding the data sources. Section 5 unravels the
features of the new proposal based on data from Greek dialects, native speakers
of standard Greek and second language learners of Greek. Finally, section 6
concludes the paper and poses issues for future research.
2. The problem
Before we move to the development of our proposal we nd it essential to

provide some information regarding the phonotactic constraints of standard
Greek. Standard Greek allows the formation of open and closed syllables.
Syllabic codas consist of maximally one consonantal segment. The repertoire
of coda segments is rather limited; only /s/ is allowed word nally and /n/, /l/,
/r/ word medially. Therefore, when either /n/, or /l/, or /r/ is the rst member
of a sequence of consonants, then these consonants are heterosyllabic. Some
representative examples of heterosyllabic sequences are given in the data in
(1). In all data sets dots indicate syllabic boundaries.
(1) a. [pr.no] take-1SG.PRES.

b. [pal.t] overcoat-NEUT.NOM.SG.
c. [r.ma] chariot-NEUT.NOM.SG.
d. [l.mi] brine-FEM.NOM.SG.
e. [n.ro.pos] man-MASC.NOM.SG.
Clusters appear only in onset position word medially and internally (Drachman
1989, Kappa 1995, Nespor 1997, and more references therein; see also Levin
1985 for more information regarding the power of universal principles driving
syllabication) satisfying the Maximal Onset Principle (Selkirk 1984). Greek is
rather free regarding the combination of consonants that may cluster together.
However, it is conservative when it comes to the number of consonants a cluster
may consist of. More specically, Greek clusters may consist of at most three
consonants. In three-member clusters the initial cluster member is in most
cases /s/.
The data in (2) provide some examples of Greek cluster phonotactics.4
Except for perfectly-formed CL sequences, obstruent CC clusters are allowed
in all possible combinations; Greek clusters are made of [voiceless stop + voice-
less stop], [voiceless stop + voiceless/voiced fricative], [voiced fricative +
voiced fricative], [voiceless fricative + voiceless fricative], [voiceless fricative +
voiceless stop] segments. [voiced obstruent + voiceless obstruent] sequences,
like /bt/ are not attested in Greek.
(2) a. CL a.pls simple-ADJ.MASC.NOM.SG.
.kri edge-FEM.NOM.SG.
.ri.os clear-ADJ.MASC.NOM.SG.
l.ros seagull-MASC.NOM.SG.
b. CC fi.ns cheap-ADJ.MASC.NOM.SG.
v.zo take out-1SG.PRES.
a.kt coast-FEM.NOM.SG.
o.pti.ks ocular-ADJ.MASC.NOM.SG.
t.fxos issue-NEUT.NOM.SG.
.ke.si composition/display-FEM.NOM.SG.
.ko.si publication-FEM.NOM.SG.5
p.fko pine-NEUT.NOM.SG.
4. This notice holds for the native vocabulary of the language.

5. /k/ and /k/ emerge in morpheme boundaries.
98 Marina Tzakosta
c. CG .jos empty-ADJ.MASC.NOM.SG.
d. CN a.km acme/prosperity-FEM.NOM.SG.
.nos nation-NEUT.NOM.SG.
e. NN a.mne.si.a amnesia-FEM.NOM.SG.
In this study, the focus is on two-member CL and CC consonant clusters
because these cluster types, rst, display a great variety of possible combinations
in Greek, second, added up they are the most frequently attested (Protopapas
et al. in press), and, third, they differ radically regarding their phonological
representation. More specically and regarding this latter parameter, CC
sequences have tight phonological representations similar to those of complex
segments, and, consequently, they are difcult to perceive and produce. On the
contrary, CL clusters have a loose phonological representation, therefore,
they are assumed to be easy to perceive (Tzakosta and Vis 2009a). CC and CL
phonological representations are depicted in figures (2a) and (2b), respectively.
Figure 2. Left panel: A phonological representation of CC clusters. Right panel: A

phonological representation of CL clusters
In our survey, we do not consider three types of consonant clusters, namely,

CG, /s/ + C and NN clusters. CG clusters are not true clusters in Greek; rather,
they are the product of vowel loss and/or raising of /i/. NN clusters, on the
other hand, are the least frequently attested in Greek (Protopapas et al. in
press). In addition, they behave similarly to CC clusters. Finally, /s/C clusters
are not taken into consideration because of the special character that sibilants
have in cluster formation (cf. Tzakosta 2009, Tzakosta and Vis 2009a). Accord-
ing to Morelli (1999), /s/C clusters are part of FS clusters. However, following
Tzakosta (2009) and Tzakosta and Vis (2009a) we assume that, although
sibilants are fricatives, the former behave differently from other fricatives. Not
randomly, sibilants, when they appear in onset position, are considered to be
extrasyllabic segments. Besides, we assume that this exible and extrasyllabic
role of /s/ makes /st/ be the most frequently attested cluster in Greek (Protopa-
pas et al. in press). It is important to note that, though excluded, these cluster
types reinforce our present account. For a relevant discussion see Tzakosta (in
press).
The major question underlying this study refers to the types of consonant
clusters emerging in various aspects of a language system. More specically,
Greek is characterized by constraints that limit the types of clusters allowed
in standard Greek. However, dialectal as well as developmental Greek L1 and
experimental L2 data reveal that clusters not allowed in the standard language
are allowed in other aspects of Greek. It will be shown that segments which
are unmarked under a theory of Markedness and, therefore, expected to
surface earlier and more accurately in L1 and L2 are substituted for more
complex segments/sequences.
Such facts suggest that a theoretical account of the segmental composition
of clusters based on Feature Geometry and Underspecication is explanatorily
inadequate. There are additional questions related to the above claims. For
example, if CL are perfect clusters due to SD why do non-perfect clusters,
such as CC, emerge massively in Greek dialects and language development?
Why do clusters not allowed by the phonotactics of standard Greek emerge
in dialectal and developmental data? These topics will be addressed in the
discussion that follows.
3. Goals of the present study and working hypotheses
Based on the question just pointed out in the previous section, the current
study has the following aims: rst, to investigate the production patterns of
CL and CC clusters with the additional aim to test whether all cluster types
have the same survival chances in their surface realization, and, second,
to make a typological account of CL and CC cluster formation in dialectal
varieties of Greek, L1 acquisition and L2 learning.
Our major working hypothesis is that the SonS is not adequate to explain
cluster formation. Rather, cluster formation should be evaluated on the basis
of three distinct scales of manner of articulation (hereafter MoA), place of
articulation (hereafter PoA) and voicing. More specically, we propose that
the MoA scale controls good cluster sonority (Clements 1988, 1990), the
PoA scale registers the satisfaction of the xed place hierarchy (Prince and
Smolensky 1993), while voicing renes cluster formation.
100 Marina Tzakosta
4. Data sources
For the purposes of the present study we draw on data from three corpora:
rst, indexed dialectal data (Tzakosta 2010, Tzakosta and Karra 2011) from
the major dialectal zones of Greek, namely, Dialects of Northern Greece
(Epirus, Meleniko, Lesvos, Pontos, Thassos, Corfu, Attica, Thessalia, Kozani,
Trikala, Samothraki, Thessaloniki, Koutsovlahika) and Dialects of Southern
Greece (Cyprus, Crete, Dodekanese, Ikaria). Data indexation was achieved
through the detailed study of grammars, atlases and dictionaries of Greek
dialects. No oral speech dialectal data were recorded.
The second corpus consisted of naturalistic Greek L1 developmental data
from 6 monolingual children whose ages ranged between 1;073;05 years.
The data were collected on the basis of a) a semi-structured technique of
picture naming and b) through free interaction with the children (Tzakosta
2004). The data were recorded and broadly transcribed using IPA.
The third corpus consisted of naturalistic Greek L2 data selected from
groups with different L1 backgrounds. First, 10 Dutch monolingual adults with
age range between 2560 years and of intermediate prociency level, and,
second, 3 Romanian monolingual adults with age range between 2751 years
of intermediate and advanced prociency level. The data collection technique
used was structured questionnaires (cf. Tzakosta 2006). Data from both groups
were recorded and broadly transcribed using IPA.
It is important to mention that our study is qualitative in nature. Therefore,
we focus on the patterns of consonant clusters that emerge in Greek varieties,
L1 and L2, rather than on the frequencies of their surface realization. Conse-
quently, we do not provide statistical analyses or input frequency effects.6
5. The current proposal and the linguistic evidence
Following the characteristics of the SonS and D we propose three types of

consonant clusters: perfect, acceptable and non-acceptable. Perfect clusters
are tautosyllabic consonantal sequences which satisfy the SonS with the
maximum possible D between their members. To give an example, stops com-
bined with liquids, like /pl/, form perfect clusters. However, perfection in
cluster formation is gradient due to D; fricatives + liquids, like //, make up
perfect clusters, though less perfect compared to stop + liquid ones, like /pl/.
6. For statistical analyses related to the topic of the current study the interested reader
may refer to Tzakosta (2009, 2010).
The D among the members of // is (3), while it is (4) among the members of
/pl/. A necessary condition for the formation of a perfect cluster is the minimal
satisfaction of all scales, i.e. with D (1).
Acceptable clusters are consonantal sequences consisting of members
mostly sharing the same landing point on the SonS. In cluster /pt/, for exam-
ple, both cluster members are voiceless stops; they only differ with respect
to place of articulation. In the discussion of the current proposal, we will high-
light the fact that acceptable clusters need to at least (vacuously) satisfy one of
the three scales.
Finally, non-acceptable clusters are consonantal sequences not respecting
the SonS. In other words, non-acceptable clusters are formed by consonants
selected on a leftward direction on the SonS, like /p/ whose rst member
is a fricative and the second is a stop. Following Tzakosta (2009), and, as
already mentioned, we assume that different patterns in the production of CL
and CC clusters are due to differences in clusters perceptual load. Different
perceptual loads are due to distinct phonological representations. In other
words, complex phonological representations mirror heavy perceptual loads
while non-complex representations mirror light perceptual loads, as having
been shown in gures (2a) and (2b) above.
The problem arises because the SonS sees segments as inseparable wholes
providing information only regarding the principles which govern cluster
formation, without giving any information about why certain clusters are
better- or worse-formed than others. According to the current proposal, the
SonS should be evaluated separately with respect to MoA, PoA and voicing
in order to assess subtle cluster differentiations. Given the cluster categoriza-
tion suggested above, we suggest that perfect, acceptable and non-acceptable
cluster formation depends on the degree of satisfaction of the scales of
manner, place and voicing which are illustrated in gures 3, 4 and 5, respec-
tively. Like the classical SonS, all scales need to be satised in a rightward
manner. However, not all clusters are perfect to the same extent, since, as
already mentioned, cluster perfection is gradient; the bigger the D among
cluster members on all scales the better-formed the cluster. The minimal possi-
ble D for perfect clusters is (1) and the maximal is (4).
The manner scale in g. 3 heavily draws on the classical SonS. In the data
in (3), (3d) is an example of a cluster which satises the manner scale, though
with the minimal possible D; the stop is the leftmost cluster member, while the
fricative is the rightmost one. In other words, /p/ in (3d) is a perfect cluster
on the manner scale with the minimal possible distance (1) holding among its
members. (3ac), on the other hand, are instances of clusters which vacuously
satisfy the manner scale because its members land at the same point on the
102 Marina Tzakosta
Figure 3. The manner scale
scale, i.e. they are both either stops or fricatives. Cluster members sharing the
same manner of articulation form acceptable clusters. In addition, in (3ab)
both cluster members are stops. It is interesting that in (3c) stop /p/ changes
to fricative /v/ and, consequently, a minimally perfect cluster becomes due
to its fricative members an acceptable one. It is important to mention again
that the difference between a minimally perfect and an acceptable cluster is the
D holding among their members. In a minimally perfect cluster D should be
(1), while in an acceptable cluster it is (0).
(3) a. /a..ti.kos/ [a..tkus] different-ADJ.MASC.NOM.SG.
b. /a.po.k.to/ [a.pk.tus] underneath-ADV.
(Meleniko, Andriotes 1989)
c. /pe./ [v] child-NEUT.NOM.SG.
d. /pi.a.m/ [pa.m] span-FEM.NOM.SG.
(Thessalia, Tzartzanos 1909)
On the other hand, the place scale depicted in g. 4 is equivalent to the
xed place hierarchy proposed by Prince and Smolensky (1993). According
to this hierarchy, velars are more marked compared to labials and labials are
more marked compared to coronals. Interpreting the xed place hierarchy
into the place scale proposed here means that a velar or a labial needs to be
the leftmost member of a cluster if a coronal is the rightmost one. Accord-
ingly, in order to form a perfect cluster, if the second member of a cluster is a
labial, the rst member needs to be a velar.
Figure 4. The place scale

The data in (4) provide evidence that the place scale is satised, though
input clusters are slightly changed in their output realization due to D. More
specically, in (4a) the perfect at the manner level cluster /l/ becomes
// in order to achieve perfection at the place level as well. More specically,
// and /l/ make up an acceptable cluster given that both segments land at the
same point on the place scale they are both coronals with D (0); however,
substitution of // for /f/ creates D (1) on the place scale among the members
of the newly formed cluster. Similarly, in (4b), although /v/ and /l/ make up a
perfect cluster and cluster members are marked by D (1), /v/ is substituted for
// in order to achieve an even bigger D (2). Again, data such as that in (4b)
illustrate that cluster perfection and acceptability are gradient. Certain clusters
are better than others due to D; the bigger the D among cluster members, the
better-formed a cluster at the level of perfection and acceptability. In other
words, clusters with members differing even minimally with respect to oA
and/or PoA are preferred to those sharing the same oA and/or PoA. Finally,
(4c) is a mirror case to those described in (4a) and (4b); more specically,
although (4a) and (4b) illustrate instances of better perfect clusters compared
to (4c), (4c) exemplies that perfect clusters may be substituted for acceptable
ones. Acceptable clusters are characterized by a small, minimal or even zero,
D among their cluster members on at least one of the three scales. In /f/, the
manner and voicing scales are vacuously satised, whereas the place scale is
minimally satised with D (1).7
(4) a. /li.ve.rs/ [i.vi.rs] depressing-ADJ.MASC.NOM.SG.
b. /vl.po/ [l.po] see-1SG.PRES. (Meleniko, Andriotes 1989)
c. /l.vo.me/ [f.vo.me] be sad-1SG.PRES.
(Pontos, Oikonomides 1958)
Finally, the voicing scale in g. 5 is the least complex scale, given that
segments may be either [voiced] or [+voiced]. According to this scale, a
perfect cluster is a cluster whose rst member is [voiced] and the second is
7. At this point we need to make a crucial clarication based on a comment of

an anonymous reader to whom we are thankful. Our account does not rely on the
Optimality Theory (hereafter OT) framework. The terms perfect, acceptable,
non-acceptable and constraints mirror general well-formedness and not any
OT principles. Therefore, we do not nd it essential to cite extended OT studies,
except for some major ones which are theoretically related to our survey.
104 Marina Tzakosta
Figure 5. The voicing scale
[+voiced]. The converse voicing order is responsible for the formation of non-
acceptable clusters. Consonants sharing the same voicing characteristics, i.e. if
they are both voiceless or voiced, form acceptable clusters.8
Voicing has been primarily dealt with with respect to voicing and devoicing
alternations emerging mostly in Germanic languages (cf. Oostendorp 2004,
2006, among others) and assimilatory processes (cf. Al-Ahmadi Al-Habi to
appear, Arvaniti 1999, Baroni 1997, Grijzenhout 2000). Such phenomena have
been accounted for mostly within OT by means of the *NC, ND, *ND con-
straints which allow or forbid NC or ND sequences to emerge (cf. Borowsky
2000, Grijzenhout 2000, Lombardi 1995, 1999, Pater 1999).9 In order to
establish a voicing scale in our proposal, the motivating question was the
following: if voice assimilation applies to non-adjacent consonants and within
consonant clusters and, at the same time, [voi] + [+voi] clusters like /k/ are
acceptable and attested in the norm and dialectal data, why are [+voi] + [voi]
clusters, like /k/, non-acceptable and, actually, non-emergent in any aspect of
Greek?
The data in (5ac) illustrate the rightward satisfaction of the voicing scale;
the rst member of the cluster is voiceless while the second is voiced. Data
(5de) highlight the creation of clusters which share the same voicing charac-
teristics. Finally, the data (5fi) pinpoint cases of regressive devoicing assimi-
lation; it is interesting that both voiced and voiceless segments may drive
assimilation, as shown in (5fi), respectively. We assume that in languages
like Greek in which both voiced and voiceless segments are allowed in all
word positions which means that neither voicing nor devoicing is preferred
assimilation of both voicing and devoicing are allowed. All clusters in (5) are
acceptable only (5c) is perfect because it minimally satises all scales
because they all vacuously satisfy at least one scale. In order to be perfect,
the clusters in (5) should at least minimally satisfy all scales.
8. cf. also Malikouti-Drachman (1987, 2001).

9. In OT, the use of * in the formation of a constraint disallows the emergence of the
* marked structure.
(5) a. /ti..ni/ [tn] frying pan-NEUTR.NOM.SG.

(Samothraki, Katsanis 1996)
b. /ku.b.ros/ [kba.r.ls] bestman-MASC.NOM.SG.
c. /ku..ni/ [ku.nl] bell-NEUT.NOM.SG.
(Thassos, Tombaidis 1967)
d. /ku.fs/ [kfs] deaf-ADJ.MASC.NOM.SG.
e. /tra.u..i/ [tra..i] sing-3SG.PRES.
f. /skou.d/ [gd] push-1SG.PRES.
g. /ku.v.ri/ [gvr] ball-NEUT.NOM.SG.
(Kozani, Margariti-Roga 1989)10
h. /i.c.li/ [cl] grub hoe-NEUT.NOM.SG.
i. /po.di.ks/ [pu.tks] mouse-MASC.NOM.SG.
(Thassos, Tombaidis 1967)
A potential argument against the existence of the voicing scale would be
that we can consider the latter as part of the SonS or the manner scale espe-
cially given that nasality and liquidity embody voicing. However, it is difcult
to account for cluster-internal voiceness without a distinct scale. In particular
given that if the voicing scale is not satised, clusters are not acceptable, and,
as a result, they are subject to cluster assimilation. In other words, the voicing
scale is the scale that always needs to be satised for a cluster to be at least
acceptable.11
To sum up, according to the three-scales model of cluster well-formedness
clusters are perfect if they satisfy all scales at least with minimal D (1). Clusters
are acceptable under certain conditions: a) if they vacuously12 satisfy all
scales, b) if they violate one of the scales of manner or place and (vacuously)
satisfy the other or c) if they violate both MoA and PoA scales but at least
vacuously satisfy the voicing scale. Non-acceptable clusters emerge as long
as a) all scales are violated, and, b) the voicing scale is violated even if the
manner and place scale are at least vacuously satised.
10. Cf. Blaho and Bye (2006) for equivalent cross-linguistic results.
11. For the conditions under which the voicing scale may be violated see Tzakosta
(2009b).
12. Scale vacuous satisfaction is characteristic only of acceptable clusters.
106 Marina Tzakosta
There is still another important question to be addressed; why are data such
as those in (6) attested in different aspects of Greek? More specically, why
are acceptable clusters preferred to perfect ones? First of all, all data in (6)
except (6e) are the result of vowel loss. Apparently, the combination of the
newly adjacent consonants is valid on the basis of the three scales. Therefore,
acceptable clusters emerge. However, it is difcult for the present proposal to
account for cases such as those of (6e) in which a perfect cluster is substituted
for an acceptable one. We assume that (6e) is rather a case of cluster mis-
perception which has been established in the dialect with time. This is apparently
an issue that is still open for discussion.
(6) a. /ku.f.i.ce/ [kf.ce] become deaf 3SG.PAST

b. /po.di.ks/ [pu.tks] mouse-MASC.NOM.SG.
c. /ti..ni/ [tn] frying pan-NEUTR.NOM.SG.
d. /ku.fs/ [kfs] deaf-ADJ.MASC.NOM.SG.
e. /l.vo.me/ [f.vo.me] be sad-1SG.PRES.
(Pontos, Oikonomides 1958)
k. /pi./ [b] jump-1SG.PRES.
l. /pe./ [v] child-NEUT.NOM.SG.
d. /tu.f.ci/ [tf.ci] gun-NEUT.NOM.SG.
We will provide representative developmental data which further support

our argumentation. It has been argued that CC sequences undergo various
repair strategies such as deletion, epenthesis, fusion, stopping, and various
types of assimilatory processes when CL clusters are correctly produced in
Greek L1 and cross-linguistically (cf. Tzakosta 2006 and more cross-linguistic
references therein). CL clusters may be produced even in cases where they are
not contained in the target form, as illustrated in (7c). Having knowledge of
but not yet having acquired the manner scale, Greek-speaking children have
multiple outputs for one input form. As exemplied in (7a), together with the
non-acceptable at the level of manner /ft/ cluster, acceptable /pt/ is also
realized. Simultaneously, sibilant /s/ may be substituted for // (see examples
in (7b), (7d)), the former occupying the same rank on the manner scale as
the latter. (7b) and (7d) violate both the manner and place scale but due to the
vacuous satisfaction of the voicing scale the output clusters are acceptable.
(7) a. /a.ft/ [a.pt], [a.ft] this-DEM.PR. (B: 1;11.27)
b. /sxo.l.o/ [xo.l.o] school-NEUT.NOM.SG. (D: 2;07.06)
c. /o.br.la/ [ku.bl.la] umbrella-FEM.NOM.SG. (Me:1;11.22)
d. /p.sxa/ [p.ka] Easter-NEUT.NOM.SG. (B.M.: 2;09.25)
Dutch and Romanian learners of Greek exhibit equivalent data, as exemplied
in (8) and (9), respectively.
(8) a. /fo.to.ra.f.a/ [fo.to.xra.f.a] photo-FEM.NOM.SG. (S1)
b. /gri..ris/ [kri.ni..ris] nasty-ADJ.MASC.NOM.SG. (S2)
c. /e.vo.m.a/ [e.vdo.m.da] week-FEM.NOM.SG. (S3)
a. /o.ri.kt/ [fri.kt] tanker-NEUT.NOM.SG. (S2)
c. /u.ra.ns/ [i.ra.ns] sky-MASC.NOM.SG. (S3)
d. /e.po./ [e.p.ksi] season-FEM.NOM.SG. (S4)
e. /c.ni.si/ [kl.si] circulation-FEM.NOM.SG. (S5)
(9) a. /f.ri.o/ [fto.r] uorine-NEUT.NOM.SG. (S3)
b. /e.vo.m.a/ [e.vdo.m.da] week-FEM.NOM.SG. (S1)
c. /a.v/ [a.vg] egg-NEUT.NOM.SG. (S2)
d. /xte.n.zo/ [kte.n.zo] comb-1SG.PRES. (S1)
e. /.no/ [gd.no] denude-1SG.PRES. (S2)
We assume that the preference for acceptable clusters is an indication of
the freer cluster formation mechanisms characteristic of Greek dialects but
also other aspects of the language; dialects especially those of the northern
dialectal zone are less conservative regarding cluster synthesis given that
clusters may appear in coda position due to the application of phonological
rules according to which high vowel loss and/ or raising apply in unstressed
syllables (Newton 1972). This allows various acceptable clusters to appear
extensively in the surface realization. In acceptable clusters, consonantal com-
binations are freer than those of a perfect cluster given that D (0) allows for a
high number of consonantal sequences to emerge. Therefore, the number of
acceptable clusters is higher than that of perfect clusters.
Cluster formation gradience is illustrated in tables 13. Table 1 illustrates
the segmental combinations which result in gradience in cluster formation at
the level of manner of articulation. Table 2 displays gradience at the level of
place of articulation, while table 3 presents gradience at the level of voicing.
108 Marina Tzakosta
Table 1. Gradience in Cluster Formation (MoA)
Cluster Types Perfect Acceptable Non-Acceptable

Stop + L Z
Fricative + L Z
Stop + Stop Z
Fric + Fric Z
Stop + Fric Z
Fric + Stop Z
Stop + Affr Z
Affr + Stop Z
Fric + Affr Z
Affr + Fric Z
Affr + Affr Z
Table 2. Gradience in cluster formation (PoA)

lab + lab Z
lab + cor Z
lab + vel Z
cor + cor Z
cor + lab Z
cor + vel Z
vel + vel Z
vel + cor Z
vel + lab Z
Table 3. Gradience in cluster formation (voicing)

[voi] + [voi] Z
[voi] + [+voi] Z /k/
[+voi] + [+voi] Z /g/
[+voi] + [voi] Z /k/
Gradience in cluster well-formedness is depicted in schema 1 where cluster

types appear in hierarchical order. Schema 1 is a combined typological synopsis
of perfect, acceptable and non-acceptable clusters with respect to all three
dimensions of manner, place and voicing. PC1 clusters are the highest in the
hierarchy and are the best of perfect clusters; all scales are respected with the
biggest possible D among cluster members. PC2 clusters are perfect sequences
with smaller D among cluster members compared to PC1 sequences. In the
same spirit, AC1 and AC2 clusters are acceptable combinations with small D
and relative satisfaction and/or violation of the scales. Finally, non-acceptable
N-AC clusters violating all scales occupy the lowest schema level.
A major contribution of the scales of manner, place and voicing is that they
shape tautosyllabic clusters and redenes heterosyllabic sequences in a new
fashion. More specically, as already mentioned in section 2 and exemplied
in the data in (1) which are rewritten as (10) here for the ease of reading, all
data in (10) vacuously satisfy the voicing scale but violate both the manner
and place scales. Given that this is one of the conditions for forming accept-
able clusters, all data in (10) form tautosyllabic sequences according to the
three-scales model of cluster well-formedness. However, such a claim is
contra to the fact that the sequences in (10) are considered to be heterosyllabic
(cf. Kappa 1995, Nespor 1997). The model proposed here establishes new
conditions for dening tautosyllabicity and heterosyllabicity. Our model im-
plies that only non-acceptable clusters constitute heterosyllabic sequences.
Therefore, because all sequences in (10) are acceptable, they are considered
to be tautosyllabic. Some preliminary psycholinguistic evidence supporting
these claims stem from Tzakosta and Vis (2009a, 2009b, 2009c); however,
more psycholinguistic experimentation needs to take place.
110 Marina Tzakosta
Schema 1. A schematic representation of perfect, acceptable and non-acceptable

clusters
(10) a. [pr.no] take-1SG.PRES.

b. [pal.t] overcoat-NEUT.NOM.SG.
c. [r.ma] chariot-NEUT.NOM.SG.
d. [l.mi] brine-FEM.NOM.SG.
e. [n.ro.pos] man-MASC.NOM.SG.
6. Conclusions and future research
In this study, we presented a typological account of CL and CC clusters based

on their production patterns in dialectal varieties of Greek as well as Greek L1
and L2 data. Our proposal is that the SonS and SD are no sufcient means
to account for the acceptability and/or perfection of consonant clusters. We
propose that clusters are categorized as i) perfect, ii) acceptable and iii) non-
acceptable on the basis of the three distinct scales of manner, place and voic-
ing introduced.
All scales are satised in a rightward manner. Clusters are perfect under
one major condition: to minimally satisfy all scales. On the other hand,
clusters are acceptable under three conditions: a) if they vacuously satisfy all
scales, b) if they violate one of the scales of manner or place and (vacuously)
satisfy the other but always (and at least vacuously) satisfy the voicing scale,
c) if the voicing scale is at least vacuously satised but both scales of manner
and place are violated. Vacuous satisfaction is characteristic of acceptable
cluster but never of perfect clusters. Non-acceptable clusters emerge as long
as a) all scales are violated, and, b) the voicing scale is violated even if the
manner and place scales are at least vacuously satised.
Acceptable clusters are mainly CC clusters and emerge massively in various
language aspects because they are exible and predicted by the typology. Non-
acceptable clusters, on the other hand, are rarely and exceptionally attested
because they are not predicted by the typology. Our assumption is that in
many cases non-acceptable clusters are the result of cluster misperception and
wrong production. They are the most marked in theory and the least attested
in empirical data. Cluster perfection and/or acceptability are not absolute
notions; rather they are gradient. Greek dialects especially those of the
northern dialectal zone are less conservative regarding cluster synthesis given
that clusters may appear even in coda position due to the application of
phonological rules according to which high vowel loss and raising applies in
unstressed syllables (Newton 1972: 196 ff.).
112 Marina Tzakosta
A major advantage of the current proposal is that the establishment of three

distinct scales driving cluster formation contributes to the clear reshape and
redenition of the phonotactic constraints of a language. More specically,
the denition of non-acceptable clusters gives no other option than to consider
such clusters as heterosyllabic. However, there is more psycholinguistic work
to be done on this topic in order to test the practical validity of such a claim.
The present account renes Morellis (1999) proposal according to which
clusters should be evaluated on the basis of two scales, manner and place.
The introduction of a voicing scale is imposed by the fact that no cluster can
be acceptable if the voicing scale is violated.
Given the parallel (vacuous) satisfaction of the scales the present proposal
succeeds in accounting for FS clusters which are not predicted by the classical
sonority scale, though they emerge massively in Greek and are considered to
be the most well-formed under Morellis (1999) account. This fact is related to
some relevant interesting issues; rst, there is a small set of FS clusters accept-
able in the present model, like /fp/, /f/, and /t/, though ruled out by OCP
given their adjacency on the manner and/or place scales. These clusters are
still expected to emerge; therefore, more data need to be tested. Another topic
is the fate of non-acceptable clusters; it would be interesting to test the extent
to which they are prone to phonetic insertion or other repair strategies in
order to survive. Such issues are amenable to future research.
References
Al-Ahmadi Al-Harbi
To appear English voicing assimilation: Input-to-output [voice] and Output-
to-Input [voice]. Journal of King Abdulaziz University 13.
Andriotes, Panagiotes
1989 The Dialect of Meleniko [ ] [in
Greek]. Thessaloniki: Publications of the Society of Macedonian
Studies.
Arvaniti, Amalia
1999 Greek voiced stops: Prosody, syllabication, underlying representa-
tions or selection of the optimal? Proceedings of the 3rd Interna-
tional Conference of Greek Linguistics. 883390. Athens: Ellinika
Grammata.
Baroni, Marco
1997 The representation of prexed forms in the Italian lexicon: Evidence
from the distribution of intervocalic [s] and [z] in northern Italian.
M.A. Thesis, Department of Linguistics, UCLA.
Blaho, Sylvia and Patrick Bye

2006 Cryptosonorants and the misapplication of voicing assimilation in
biaspectual phonology. ROA-759.
Borowski, Toni
2000 Word faithfulness and the direction of assimilation. The Linguistic
Review 17: 128.
Clements, Nick G.
1988 The Role of the Sonority Cycle in Core Syllabication. Working
papers of the Cornell Phonetics Laboratory 2: 168.
Clements, Nick G.
1990 The role of the sonority cycle in core syllabication. In John Kingston
and Mary E. Beckman (eds.), Papers in laboratory phonology I:
between the grammar and physics of speech. 283333. Cambridge:
Cambridge University Press.
Clements, Nick G.
1992 The sonority cycle and syllable organization. In Wolfgang U. Dressler,
Hans C. Luschuetzky, Oskar E. Pfeiffer and John Rennison (eds.),
Phonologica 1988. Proceedings of the 6th International Phonology
Meeting. 6376. Cambridge: Cambridge University Press.
Drachman, Gaberell
1989 A remark on Greek clusters. Ms. Department of Linguistics, Univer-
sity of Salzburg.
Drachman Gaberell
1990 Onset clusters in Greek. In Joao Mascar and Marina Nespor (eds),
Grammar in progress. Glow Essays for Henk van Riemsdijk. 113
123. Dordrecht: Foris.
Ewen, Colin and Harry van der Hulst
2001 The Phonological Structure of Words. An Introduction. Cambridge:
Grijzenhout, Janet
2000 Voicing and devoicing in English, German and Dutch: Evidence for
domain-specic identity constraints. Working Papers Theorie des
Lexikons 116. Heinrich-Heine-Universitt Dsseldorf.
Grijzenhout, Janet and Martin Kraemer
2000 Final devoicing and voice assimilation in Dutch derivation and cliti-
cization. In Barbara Stiebels and Dieter Wunderlich (eds.), Lexicon
in Focus. Studia grammatica 45: 5582. Berlin: Akademie Verlag.
Hayes, Bruce
1995 Metrical Stress Theory: Principles and Case Studies. Chicago: Uni-
versity of Chicago Press.
Hulst van der, Harry
1984 Syllable structure and stress in Dutch. Ph.D. dissertation, University
of Leiden.
114 Marina Tzakosta
Jany, Carmen, Matthew Gordon, Carlos M. Nash, Nobutaka Takara

2007 How universal is the sonority hierarchy? A cross-linguistic acoustic
study. Proceedings of the 16th International Conference of Phonetic
Sciences: 14011404.
Jespersen, Otto
1904 Lehrbuch der Phonetik. Teubner: Leipzig.
Kappa, Ioanna
1995 Silbenphonologie im Deutschen und Neugriechischen. Ph.D. disser-
tation, University of Salzburg.
Katsanis, Nikolaos A.
1996 The Dialect of Samothrace [ ]
[in Greek]. Thessaloniki.
, ikolaos .
1983 Final consonants consonant clusters of the dialect of Drimos: An
attempt of their systemization [
:
] [in Greek]. Proceedings of the 13th Annual Meeting of Greek
Linguistics 8: 3343.
Lass, Roger
1984 Phonology: An Introduction to Basic Concepts. Cambridge: Cam-
bridge University Press.
Levin, J.
1985 A metrical theory of syllabicity. PhD. Massachusetts Institute of
Technology.
Lombardi, Linda
1995 Laryngeal features and privativity. The Linguistic Review 12: 3559.
Lombardi, Linda
1999 Positional faithfulness and voicing assimilation in optimality theory.
Natural Language and Linguistic Theory 17: 267302.
Malikouti-Drachman, Angeliki
1987 Syllables in modern Greek. In Wolfgang U. Dressler, Hans Lu-
schtzky, Oskar E. Pfeiffer and John R. Rennison (eds.), Phonolog-
ica 1984. Proceedings of the Fifth International Phonology Meeting.
181187. Cambridge: Cambridge University Press.
Malikouti-Drachman, Angeliki and Gabriel Drachman
1990 Phonological government and projection: Assimilations, dissimi-
lations [ : A,
] [in Greek]. Working Papers in Greek Grammar. 1
20. University of Salzburg.
Malikouti-Drachman, Angeliki
2001 Greek Phonology: A Contemporary Perspective. Journal of Greek
argariti-Roga, arianna
1989 Weaving and clothing terms of Katagion (prefecture of Kozani).
[ ( )]
[In Greek]. Greek Dialectology 2: Thessaloniki: Kiriakidis Public.

173180.
Morelli, Frida
1999 The phonotactics and phonology of obstruent clusters in optimality
theory. Ph.D. dissertation, University of Maryland at College Park.
Nespor, Marina
1997 Phonology [in Greek]. Athens: Patakis.
Newton, Brian
1972 The Generative Interpretation of a Dialect. A Study of Modern
Greek Phonology. Cambridge: Cambridge University Press.
Oikonomides, Demosthenes I.
1958 Grammar of the dialect of Pontos [
] [In Greek] Dictionary Bulletin 1: Athens:
Academy of Athens.
Oostendorp van, Marc
2004 An exception to nal devoicing. Rutgers Optimality Archives-656.
Oostendorp van, Marc
2006 Incomplete devoicing in formal phonology. Ms. Amsterdam: Meertens
Institute.
Pater, Joe
1999 Austronesian nasal substitution and other NC effects. In Rene Kager,
Harry van der Hulst and Wim Zonneveld (eds.), The Prosody-
Morphology Interface. 310343. Cambridge: Cambridge University
Press.
Prince, Alan and Paul Smolensky
1993 Optimality theory: Constraint interaction in generative grammar.
Ms. Rutgers University, New Brunswick, N.J. and University of
Colorado, Boulder.
Protopapas, Athanasios, Marina Tzakosta, Aimilios Chalamandaris and Pirros Tsiakoulis
In press IPLR: an online resource for Greek word-level and sublexical infor-
mation. Language Resources and Evaluation.
Selkirk Elisabeth O.
1984 On the major class features and syllable theory. In Mark Aronoff
and R. Oerle (eds), Language sound structure. 107136. Cambridge,
MA.: MIT Press.
Sievers, Eduard
1901 Grundzge der Phonetik zur Einfhrung in das Studium der Lautlehre
der Indogermanischen Sprachen. Breitkopf und Hrtel: Leipzig.
Steriade, Donca
1982 Greek prosodies and the nature of syllabication. Ph.D. dissertation.
Massachusetts Institute of Technology.
Tzakosta, Marina
2004 Multiple parallel grammars in the acquisition of stress in Greek L1.
Ph.D. dissertation, LOT Dissertation Series 93, Leiden: ULCL/HIL.
116 Marina Tzakosta
Tzakosta, Marina
2006 Developmental paths in L1 and L2 phonological acquisition: conso-
nant clusters in the speech of native speakers and Turkish and Dutch
learners of Greek. In Andrianna Belletti, Elisa Bennati, Cristiano
Chesi, Elisa di Domenico and Ida Ferrari (eds.), Language Acquisi-
tion and Development: Proceedings of GALA 2005, Generative Ap-
proaches in Language Acquisition. 536549. Cambridge: Cambridge
Scholars Press.
Tzakosta, Marina
2009 Asymmetries in /s/ cluster production and their implications for
language learning and language teaching. Proceedings of the 18th
International Symposium of Theoretical and Applied Linguistics.
365373. Department of English Language and Linguistics: Aristotle
University of Thessaloniki.
Tzakosta, Marina
2010 The importance of being voiced: cluster formation in dialectal
variants of Greek. In Angela Ralli, Brian Joseph, Marc Janse and
Athanasios Karasimos (eds.), E-proceedings of the 4th international
Conference of Modern Greek dialect and Linguistic Theory. 213
223. University of Patras. http://www.philology.upatras.gr/LMGD/
el/index.html (ISSN: 17923743).
Tzakosta, Marina
In press Consonantal interactions in dialectal variants of Greek: a typological
approach of three-members consonant clusters. Greek Dialectology
6.
Tzakosta, Marina and Athanasia Karra
2011 A typological and comparative account of CL and CC clusters in
Greek dialects. In Marc Janse, Brian Joseph, Angela Ralli and Spyros
Armosti (eds.), Studies in Modern Greek Dialects and Linguistic
Theory I. 95105. Nicosia: Kykkos Cultural Research Centre.
Tzakosta, Marina and Jeroen Vis
2009a symmetries of consonant sequences in perception and production:
affricates vs. /s/ clusters. In Anastasios Tsangalidis (ed.), Selected
Papers from the 18th International Symposium on Theoretical and
Applied Linguistics. 375384. Department of English Language
and Linguistics: Aristotle University of Thessaloniki: Monochromia.
2009b Perception and production asymmetries in Greek: evidence from the
phonological representation of CC clusters in child and adult speech.
Greek Linguistics 29: 553565.
2009c Phonological representations of consonant sequences: the case of
affricates vs. true clusters. In Georgios K. Giannakis, Mary Baltazani,
Georgios I. Xydopoulos and Tassos Tsaggalidis (eds.), E-proceed-
ings of the 8th International Conference of Greek Linguistics
(8ICGL). 558573. Department of Greek Philology: University of

Ioannina. (ISBN: 978-960-233-195-8). http://www.linguist-uoi.gr/
cd_web/arxiki_en.htm)
Tzartanos, Achilleas
1909 On the modern dialect of Thessaly [
]. Reprinted as an appendix in Greek Dialec-
tology 1 (1989): Thessaloniki: Kiriakidis Publ.
Tobaidis, Dimitrios
1967 The dialect of Thassos [ ] [in Greek].
Ph.D. dissertation, University of Thessaloniki.
Vennemann, Theo
1972 On the Theory of syllabic phonology. Linguistische Berichte 18: 1
18.
Vennemann, Theo
1978 Universal syllabic phonology. Theoretical Linguistics 12: 85129.
Vennemann, Theo
1988 Preference Laws for Syllable Structure. Berlin: Mouton de Gruyter.
Consonant clusters in four Samoyedic languages
Zsuzsa Varnai
Abstract
The purpose of this paper to present a description of the clusters of Samoyedic languages:
Nenets (Tundra), Enets, Nganasan and Selkup (Taz dialect), which are endangered
Uralic languages spoken in North-Siberia in Russia.
In this paper I will give an account of the syllable types attested in root lexemes and
discuss the constraints that apply to the constituents of the syllable in four examined
languages. Despite the fact that these languages are historically and geographically
very close to each other, they have different syllable structures, and they choose differ-
ent processes to adapt borrowed clusters from Russian. I will focus on the similarities
and differences between these languages with respect to the processes affecting clusters
in Russian loanwords. Russian is counted as having complex syllable structure, very
different from the Samoyedic languages.
After a brief description of the languages in question I dene a syllable template and
the representation of the syllable for each language. Then I specify the possible com-
plexity of onset and coda, and I show what types of sequences exist in these languages
and what types do not. Then I discuss what happens in these languages to relatively old
Russian loanwords.
1. Introduction
A subeld of phonological research focuses on the syllable. It registers the

rules of syllable structure in individual languages and studies the factors den-
ing syllable shapes as well as the ways their elements are connected to each
other to build them up. In this paper I will give an account of the syllable
types attested in root lexemes and discuss the constraints that apply to the con-
stituents of the syllable in four Samoyedic languages. I will focus on the
similarities and differences between these languages with respect to the pro-
cesses affecting clusters in Russian loanwords.
First after a brief description of the languages in question, and their socio-
linguistic situation I dene a syllable template and the representation of the
syllable for each language. Then I specify the possible complexity of onset
and coda, and I show what types of sequences exist in these languages and
what types do not. Blevins (1995) proposes binary parameters to account for
language-particular variation in syllable typology. I present these parameter
settings, too; then I discuss what happens in these languages to relatively old
120 Zsuzsa Vrnai
Russian loanwords. Russian has many clusters, not only in word medial posi-
tion across syllable boundaries, but also in onset position at the beginning of
the word. My research questions are the following: How are Russian con-
sonant clusters treated in Samoyedic? Which types of clusters are retained,
and which ones are simplied in the course of borrowing from Russian?
What happens to branching onsets in Samoyedic languages? Which way do
they choose to adapt these clusters? Do they all choose the same way or dif-
ferent ways? Which types of sequences undergo simplication processes, and
what processes do they undergo?
1.1. Sources
The purpose of this paper to present a description of the clusters of Samoyedic
languages, esp. of Nenets (Tundra), Enets, Nganasan and Selkup (Taz dialect),
which are endangered Uralic languages spoken in North-Siberia in Russia (see
map in Fig. 1). They have not yet been thoroughly investigated in the phono-
logical literature. Despite the fact that these languages in question are histori-
cally and geographically very close to each other, they have different syllable
Figure 1. Geographical distribution of languages in Western Siberia

Consonant clusters in four Samoyedic languages 121
structures, and they choose different processes to adapt borrowed clusters from
Russian. Russian is counted as having complex syllable structure (see WALS
2005), very different from the Samoyedic languages. It is very remarkable that
different repair mechanisms are found for the same Russian cluster type.
1.2. The languages under investigation: general description, demography,

language status
Here we summarize the linguistic situation of languages discussed above: we
present geographical location, number of speakers, dialectal distribution (based
on Sipos et al. 2007).
NENETS, YURAK-SAMOYED
Territory / Region:
Russia, Northeast Europe and Northwest Siberia in the Tyumen Region:
Yamal-Nenets, Khanty-Mansi Autonomous Area, Krasnoyarsk:
Tajmyr Municipal District of Krasnoyarsk Region in the Arkhangelsk Region.:
Nenets Autonomous Area
Dialect: Tundra and Forest Nenets
Ethnic population: 41,302
Total number of speakers: 29,052
ENETS, YENISEY SAMOYED

Territory / Region:
Russia, Northern Siberia
Tajmyr Municipal District of Krasnoyarsk Region in villages: Potapovo,
Vorontsovo and Tukhard (close, nomadic in tundra); in city: Dudinka
Dialect: Tundra and Forest Enets
Ethnic population: 237
Total number of speakers: 84
NGANASAN, TAWGY SAMOYED

Territory / Region:
Russia, Northern Siberia
Tajmyr Municipal District of Krasnoyarsk Region in villages: Ust-Avam,
Volochanka, Chatanga, Novaja, and in Dudinka
Dialect: Avam, Vadej
122 Zsuzsa Vrnai
SELKUP, OSTYAK SAMOYED

Territory / Region:
Russia, Western Siberia
in the Tomsk Region, the Krasnoselkup and the Pur Districts of the Yamal-
Nenets Autonomous Area, and the Turukhansk District of the Krasnoyarsk
Region
Dialect: Taz-Turukhan; Tym; Narym; Ob; Ket
In terms of the number of speakers, these undoubtedly count as small com-
munities that have no autonomy. Nowadays they all live in autonomous areas
where they form minorities. With the exception of the Nenets, these peoples
add up to 12% of the inhabitants in their own autonomous areas. Moreover,
the Selkup people live in three autonomous areas, so they are geographically
distributed and show considerable dialectal differences.
The data of the 2002 census of Russia (www.perepis2002.ru) are unreliable
in relation to knowledge of languages. The question concerning mother tongue
was deleted, thus the data should be taken as results of estimations. The data
of the 1989 and 2002 census also show that the ratio of those who speak their
language dramatically decreased between 1989 (the date of the previous census)
and 2002: Nenets: 2%, Selkup: 10%, Nganasan: 23%, Enets: 3%. Accord-
ing to our research the actual number of the speakers is less and it depends on
the denition of native-speaker prociency (personal communication with other
eldworkers Valentin Goussev, Olesya Khanina, Andrej Shluinski, Florian
Siegl, Sndor Szevernyi, Beta Wagner-Nagy).
The indigenous people of this territory are compelled to leave their home-
land and move to villages and towns giving up not only their traditional
culture but also their language. The present-day linguistic situation is the
following: the people still speaking these four Samoyedic languages belong
to the oldest age groups. That is to say, most members of these ethnic groups
are not balanced bilingual speakers but their rst language is Russian, while
the use of the language of their parents and grandparents is strongly restricted,
and intergenerational transmission has practically stopped.
On the whole, these days the Siberian Uralic languages are used only in
home contact, while Russian is spoken in every other domain, given that the
Russian language is excellently known and spoken by almost 100% of the
people belonging to the ethnic groups in question. Even if the parents register
their children as indigenous, they do not nd it important or preferable to
teach them their native language as they are convinced that their children will
need the Russian language for prosperity.
Finally, let me compare the linguistic situation of the four Samoyedic minor-
ities under review with Fishmans Graded Intergenerational Disruption Scale
(GIDS) (1991, 2001). He has designed a framework to assist speakers of an
endangered language in revitalizing their mother tongue and in reversing
language shift. We have relied on the model when identifying the threatened
status of the Uralic minority languages described above and assigned each to
the following GIDS levels:
Stage 8 So few uent speakers that community needs to re-establish
language norms; often requires outside experts (e.g., mostly native
speaker linguists).
Stage 7 Older generation uses language enthusiastically but children are not
learning it. L1 is only taught as L2.
Stage 6 Language and identity socialization of children takes place in home
and community.
Stage 5 Language socialization involves extensive literacy, usually including
non-formal L1 schooling.
Stage 4 L1 used in childrens formal education in conjunction with national
or ofcial language.
Stage 3 L1 used in workplaces of larger society, beyond normal L1
boundaries.
Stage 2 Lower governmental services and local mass media are open to L1.
Stage 1 L1 used at upper governmental level.
Assigning the four Uralic minority languages described above to these levels,
the following situation was found: their situation is alarming in general; Enets
and some Selkup dialects are at Stage 8; Nganasan and some Nenets and
Selkup dialects are at Stage 7. Only some reindeer herding Nenets communities
are at Stage 6.
Most of the sources used in the study provide only word lists without con-
text. They are usually written documents and dictionaries. The Nganasan and
Enets dictionaries are written for pupils of primary schools, including approx.
3,000 entries, while two others, the Selkup and Nenets ones, contain far more
entries. Alternative sources may not be useful for loanwords. Even though
there are many published texts of these languages, they are usually tales, folk-
lore texts, and stories with very few loanwords.
Nenets: Tereenko (1989), Tundra Nenets dialect
Enets: Sorokina & Bolina (2001), Tundra Enets dialect
Nganasan: Kosterkina, Momde & danova (2001)
Selkup: only the Taz dialect will be under investigation here (Helimski 2007)
124 Zsuzsa Vrnai
2. The survey
2.1. Consonant system and syllable structure
In this section we give a schematic discussion of the consonant systems and

the phonotactics of the chosen languages. Each statement in this part of the
article is based on the authors own research, where no additional reference is
given. The classications of the consonants may appear to be oversimplied
from a phonetic point of view but for a phonological classication they are
quite adequate. It is important to consider the ways the segments are allowed
to combine with each other in making longer structures, such as syllables.
Some languages allow rather free combinations of segments, while in others
the combinations are strongly restricted (Maddieson 2008). The complexity
of sequencing of segments within syllables will be discussed in the four
languages in question. I will show the constraints that apply within the con-
stituents of the syllable and I will dene the syllable templates focusing on
consonant clusters.
A consonant cluster is a group or sequence of consonants that appear
together without a vowel between them. In many languages it is important to
distinguish tautosyllabic consonant clusters at the beginning of the word from
those occurring in word medial position or at the end of the word. In medial
position there can be an intervening syllable boundary; i.e. these clusters are
heterosyllabic. At the end of the word a morpheme boundary between the
consonants is common (including in Russian). However, in our case it is not
necessary to distinguish initial and nal clusters since in the available sources
for the present work words are mostly monomorphemic.
The potential order of the segments inside the syllable is predictable and
the non-occurring combinations can be regarded as results of the restrictions
that the universal sonority hierarchy imposes. The sonority hierarchy is a univer-
sal principle, generally valid for natural languages, though obviously having
language specic exceptions (Sonority Sequencing Principle SSP). Sonority
is a feature of segments the value of which rises from the left edge towards the
nucleus of well-formed syllables and falls from then on to the right edge.
More than one type of sonority hierarchy is known in the literature, but they
only differ in minor respects (cf. e.g. Kenstowicz 1994: 254). Sequences of
syllables are governed by a Syllable Contact Law (Clements 1990: 287; cf.
Vennemann 1988: 40). According to the Syllable Contact Law (SCL), the nal
element of a syllable is not less sonorous than the initial element of an imme-
diately following syllable. In Vennemanns version of the SCL, the greater the
positive difference in sonority between C1 and C2 the better the contact; thus
the sequence an.ta is preferred (more natural, less marked) to ap.ta. Con-
sequently the most preferred heterosyllabic cluster is the sonorant-obstruent
(SO) cluster and obstruent-sonorant less well-formed (OS). Referring to the
SSP and SCL I will determine the well- or ill-formedness of the clusters.
In the next section we present the consonant systems and their distribution,
the most signicant phonotactical restrictions and regularities, and CC combina-
tions for each of the languages.
2.1.1. Tundra Nenets

The classication of Nenets consonants is shown in Table 1.
Table 1. The consonant system of Nenets (Tereenko 1966b, Salminen 1977)
fricative liquid glide

plosive affricate sibilant spirant nasal lateral trill
vless voiced vless vless vless voiced voiced
labial p pj b bj m mj w
dental t tj d dj c cj s sj n nj l lj r rj
palatal j
velar k x
glottal
The following table shows which consonants can occur in the various syllabic
positions in this language:
Table 2. The distribution of consonants in Nenets
p pj t t j k b bj d d j c c j s s j x m m j n n j l lj r rj w j
#__ + + + + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + + + + + + + + + + + + +
We now consider what kinds of phonotactic regularities apply in Nenets, i.e.

what characterizes the constituents of the Nenets syllable: The Nenets onset is
obligatory and it can be non-complex only. Thus only consonant-initial syllables
126 Zsuzsa Vrnai
are possible. The nucleus may be simple or branching in Nenets. Complex

nuclei have to involve the same constituent (there arent any diphthongs). Codas
in Nenets can be diverse: empty, simple and complex also. Word nal cluster
types are shown in Table 3.
Table 3. Complex codas in nal position (CC#) in Nenets
C2 O S
C1 plosive affricate spirant nasal liquid glide
plosive ++ ++ ++
O affricate
spirant
nasal ++ ++ +
S liquid ++ ++ ++
glide ++ +
= ill-formed in terms of sonority

= this type of cluster is nonexistent or very rare (only 1 or 2 examples)
+ = several clusters of this type
++ = numerous clusters of this type
O = obstruent
S = sonorant
The most frequent word nal types of clusters are SO clusters in Nenets. Ob-
truents occur frequently as the second constituent. Of the sonorants only the
nasals can occur in C2 position; liquids and glides do not occur at all. Affri-
cates and spirants cannot form the rst element of the cluster.
Transsyllabic clusters in Nenets are shown in Table 4. They are adjacent
segments belonging to two different syllables.
The distribution of transsyllabic cluster types in Nenets is slightly different
than nal codas: affricate and spirant can be the rst element of the cluster,
and liquids and glides can occur in C2 position.
There are also clusters of three elements in Nenets. They can occur in
medial and nal position. In medial position the syllable boundary is after the
C2 : C1 C2 $C3 . They are generally all well-formed clusters from the viewpoint
of sonority. In syllable contact (i.e., intervocalic) clusters of three elements, C1
is most often a plosive, a liquid, a glide or a nasal, C2 is a glottal stop, an
obstruent, an affricate or a nasal and C3 is an obstruent, a nasal or a glottal
stop. Their elements are never from the same class.
Table 4. Syllable contact: CC combinations (C$C) in Nenets
C2 O S
C1 plosive affricate spirant nasal liquid glide
plosive + + + +
O affricate +
spirant + +
nasal ++ ++ + +* +
S liquid ++ ++ + ++ + +
glide + + + ++ ++
= ill-formed in terms of sonority; *only m

= this type of cluster is nonexistent or very rare (only 1 or 2 examples)
+ = several clusters of this type
++ = numerous clusters of this type
O = obstruent
S = sonorant
2.1.2. Enets
The classication of Enets consonants is shown in Table 5.
Table 5. The consonant system of Enets (Tereenko 1966d, Glukhij 1978)
fricative liquid
plosive nasal glide
sibilant spirant lateral trill
vless voiced vless voiced vless voiced voiced voiced
labial p b m
dental t d s []* n l r
palatal t j/t dj sj [ j]** nj lj j
velar k g x
glottal
* = free variant of s
** = free variant of s j
128 Zsuzsa Vrnai
Table 6. The distribution of consonants in Enets
p t t k b d dj g s sj x m n nj l lj r j
#__ + + + + + + + + + + + + + + + + +
V_V + + + + + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + + + + + +
The following phonotactic regularities apply in Enets: The onset can be empty
or lled, but it can be non-complex only. Thus both vowel- and consonant-
initial syllables are possible. The nucleus may be simple or branching in
Enets. Complex nuclei occur only when they dominate a single element as in
Nenets. Codas in this language may be empty or simple. There are no CCC
clusters in Enets.
Enets syllable contact cluster types are shown in Table 7. In Enets, some
intervocalic geminates can occur word medially: dd, gg, , ss.
Table 7. Syllable contact: CC combinations (C$C) in Enets
C2 O S
C1 plosive fricative nasal liquid glide
plosive ++ + + +
O
fricative + + + +
nasal + + + +
S liquid ++ ++ +
glide ++ ++ + +

= that type of cluster is nonexistent or very rare (only 1 or 2 examples)
+ = there are several clusters of that type
++ = there are numerous clusters of that type
The most frequent types of clusters are SO also in Enets. Obtruents and also
sonorants occur as the second constituent, except glides.
2.1.3. Nganasan
The classication of Nganasan consonants is shown in Table 8.
Table 8. The consonant system of Nganasan (Tereenko 1966c, 1979;

Helimski 1998; Vrnai 2002)
fricative liquid
plosive sibilant spirant nasal lateral trill
vless voiced vless voiced voiced voiced
labial [p]* b m
dental t d** s ** n l r
palatal t/c dj sj lj j
velar k ** x
glottal
* = occurs before voiceless obstruents only

** = occurs only in onsets of closed syllables
Table 9. The distribution of consonants in Nganasan
t t k b d dj g s sj h m n nj l lj r j
#__V + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + + +
__C + p + + + + + + + +
C__ + + + + nd + + + + + + + + + + +
V__# + + + + +
The following phonotactic regularities apply in Nganasan: The onset can be

empty or lled, but it can be non-complex only. Thus both vowel- and
consonant-initial syllables are possible. But there is a constraint for onsets:
when the nucleus is complex, the onset has to be lled in initial position. The
nucleus may be simple or branching as in Nenets and Enets, but complex
nuclei occur only with different constituents in Nganasan. Codas in this lan-
guage may be empty or simple as in Enets.
130 Zsuzsa Vrnai
Nganasan CC combinations are shown in Table 10.
Table 10. CC combinations syllable contact clusters (C$C) in Nganasan
C2 O S
C1 plosive fricative nasal liquid
plosive * * * *
O
fricative
nasal + ++ +
S
liquid ++ ++ +
= are ill-formed in terms of sonority; *except b and

In Nganasan ill-formed clusters in terms of sonority are very rare (except b +

obtruent/sonorant and obstruant/sonorant). The most frequent types of
clusters are SO. There is no CCC in Nganasan; after derivation/inection there
could be C#CC, but a simplication process derives C1 C2 C3 ! C1 C3 .
2.1.4. Selkup
The Selkup consonant system is shown in Table 11.
Table 11. The consonant system of Selkup (Tereenko 1966a)
plosive affricate fricative nasal lateral trill glide

labial p m w
dental t s n l r
palatal t nj lj j
velar k
uvular q
Table 12. The distribution of consonants in Selkup
p t k q t s m n nj l lj r j w
#__ + + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + +
The following phonotactic regularities apply in Selkup: The onset can be

empty or lled, but it can be non-complex only. Thus both vowel- and con-
sonant-initial syllables are possible. The nucleus may be simple or branching
as in Nenets and Enets, and the complex nuclei occur only when they dominate
a single element. Codas in this language may be empty or simple as in Enets
and Nganasan.
Selkup CC combinations are shown in Table 13.
Table 13. CC combinations syllable contact clusters (C$C) in Selkup
C2 O S
C1 plosive affricate fricative nasal liquid glide
plosive ++ ++ ++ + +
O affricate ++ ++
fricative ++ ++ + +
nasal ++ ++ + + +
S liquid ++ ++ ++ ++ ++
glide ++ + +

In Selkup we can nd the same situation as in the other languages discussed

above: Consequently the most frequent types of clusters are sonorant-obstruent
clusters, and that type of combination where C2 is a glide, does not occur.
132 Zsuzsa Vrnai
To summarize, we observe a tendency to avoid ill-formed CC combinations

in the studied languages since they prefer sonorant-obstruent clusters, where
according to the Syllable Contact Law the nal element of a syllable is more
sonorous than the initial element of a following syllable. Clusters with glides
as second element are nonexistent in Samoyedic languages (this is the most ill-
formed CC combination).
Now on the basis of this section we can dene the syllable complexity in
the four languages in question, and observe the basic syllable types.
2.2. Syllable complexity in Samoyedic languages

In this section, I specify the possible complexity of the onset and the coda in
the languages in question.
Blevins (1995) reviews the basic syllable and nucleus types in the languages
of the world in her general typological paper and suggests a grouping. Table 14
below summarizes the basic syllable types of the four languages investigated
here.
Table 14. The basic syllable types of the chosen languages
Nganasan Nenets Enets Selkup

V + + + +
CV + + + +
CVC + + + +
VC + + + +
CCV
CCVC
CVCC +
VCC +
CCVCC
CVCCC
Nucleus types
V + + + +
V1V1 + + +
V1V2 + + +
Blevins (1995) proposes binary parameters to account for language-particular

variation in syllable typology; I show these parameter settings for the four
Samoyedic languages. They are very restrictive in terms of complex edge
components, e.g. they do not permit initial consonant clusters. There are only
intersyllabic, medial clusters at syllable contacts, except in Nenets.
Table 15. Parameters of Blevins (1995) applied to Samoyedic languages
Nganasan Selkup Enets Nenets

Complex Nucleus yes yes yes yes
Obligatory Onset no* no no yes
Complex Onset no no no no
Coda yes yes yes yes
Complex Coda no no no yes
Edge Effect yes/Initial no no yes/Final
* it is yes, when the nucleus is complex, the onset has to be
lled in initial position.
Consonants cannot appear as syllable nuclei in any of the four languages ana-
lysed here. There are no complex edge components in Samoyedic languages in
any position, except for nal complex codas in Nenets. After derivation and
inection there could be C#CC; but a simplication process applies, deriving
C1 C2 C3 ! C1 C3 . These languages have moderately complex syllable struc-
ture, which is the most frequent structure in the worlds languages (247 of
485 studied languages have moderately complex syllable structure according
to WALS 2005), which means they permit a single consonant after the vowel
and/or allow two consonants to occur before the vowel, but adhere to a limita-
tion to only the common two-consonant patterns (Maddieson 2008, WALS
2005). Edge effects are active in Nganasan, where there is initial obligatory
onset whenever the nucleus is branching, and in Nenets, where there are
medial and nal branching codas and nal clusters with three constituents
(CCC#).
Table 16 summarizes the information given in Tables 4, 7, 10, and 13 about
contact clusters (i.e., those straddling a syllable boundary) in the languages
investigated.
134 Zsuzsa Vrnai
Table 16. Syllable contact: C$C combination types in Samoyedic languages
C2 Nenets Enets Nganasan Selkup

C1 O S O S O S O S
O + + + + ++ +
S ++ ++ ++ + ++ + ++ +
The most frequent type in in the four languages in question is sonorant-obstruent

cluster, which is the most well-formed type from the viewpoint of sonority.
Nganasan is more restrictive than the other languages since there are neither
obstruent obstruent nor sonorant + sonorant clusters.
2.3. Russian loanwords in the Samoyedic languages

To what extent do loanword adaptations result in phonetic similarity to the
non-native form? It is a well-known fact that methods of loanword borrowing
depend on the language status of the speech community. The data indicate that
early and later borrowings are different. When the contacts are not so inten-
sive, the repair processes act more radically than when the speech community
is bilingual. Later loans are more similar to the donor language form.
There are many loanwords exactly corresponding to the Russian form, more
often the Russian written form (the following examples are from Nganasan,
but analogous data apply to all Samoyedic languages):
kolhoz collective farm
maina car
moloko [Plur] milk
moskva Moskva
pervomai 1th of May.
These words are from the time when the Samoyedic speech community had
practically turned bilingual. In this situation bilingualism is very intensive,
and loanword adaptation increasingly takes the form of direct adoption. A
similar phenomenon was observed by Thomason and Kaufman (1988: 33):
early and later Russian loans in Yupik (Asian Eskimo language) differ in the
same way:
Russian Yupik (Asian Eskimo language)
Early loan Later loan
[tabak] [tavaka] [tabak] tobacco
[bljutc] [pljusa] [bljutca] saucer
Accordingly, I will only analyse early Russian loans in the four Samoyedic
languages, and do not deal with later adoptions of the bilingual speech com-
munities.
Given the strong restrictions on onset and coda complexity in Samoyedic
languages, and the extensive range of clusters found in Russian, it is interest-
ing to examine the processes affecting Russian loanwords.
Not all clusters have been investigated in all languages; only those can be
discussed here which were represented in the sources. Gaps in the picture are
due to missing evidence, i.e. if the relevant clusters do not occur in the dic-
tionaries, or are not represented in the sample. Unfortunately we lack exten-
sive quantities of data, so we cannot make predictions but can only review
the regularities.
2.3.1. Repair processes

Russian has many clusters, not only in word medial position at syllable boun-
daries, but also in onset position at the beginning of the word. I show which
types of sequences undergo simplication processes, what kind of processes
they undergo, and where (in which position) they are retained. What happens
to branching onsets in Samoyedic languages? Which way do they choose to
adapt these clusters?
We can observe six different repair processes in the course of borrowing
from Russian: epenthesis, C1 -deletion, C2 -deletion, CV-metathesis, syncope
and substitution. Here we will mention a few examples of each repair process;
see the appendix for additional examples.
2.3.1.1. Epenthesis
The most frequent strategy is epenthesis. It is active in every position in all the
four languages examined here. This, moreover, corresponds to crosslinguistic
data: epenthesis appears to be the most frequent adaptation process in languages
(Paradis and LaCharit 1997).
When vowel epenthesis is used to break up a consonant cluster, there is
often more than one location where the vowel could be placed to produce a
phonotactically acceptable output. For example, if a language has open syllable
structure {CV, V}, hence disallowing CC clusters at the beginning of a word,
an initial CCV could be broken up by putting a vowel before the consonants
(VC.CV) prothesis or between the consonants (CV.CV) anaptyxis. In a
medial CCC cluster, the vowel could occur before the second or third con-
sonant. The choice of epenthesis locations is language specic.
136 Zsuzsa Vrnai
Epenthesis affects clusters in all three positions, most often in complex

codas in nal position, and complex onsets in word inital position, but it can
occur at different locations in the word in the same language.
The inserted vowel used to resolve the Russian complex onset and syllable
contact cluster most often is the same as the vowel of the next syllable; thus
the epenthetic vowels generally match the input vowel on the right of the
epenthetic site:
onset (#CC):
krupa > Nenets xurupa cereals
drob j > torob barrel
brezent > persent y
klass > xalas class
kl j ut > Enets kul j ut key
pla > pala log
brevno > beremno beam
kl j ut > Nganasan kul j ut key
brigada > birigad brigade
kladovka > kaladovka chamber
krest j > kirist chrest
gruz > Selkup kurus cargo
syllable contact (C$C):
t j urma > Nganasan t j yryma prison
st j eklo > Selkup t j ekla glass
The epenthetic vowel after a word nal coda cluster is usually a. Otherwise it
is language particular what kind of vowel is inserted between the consonants
of the cluster: in Nganasan very often (~70%), and i (~10%), when there is a
palatalised consonant in the Russian form.
coda (CC#):
kilometr > Nenets xilometra km
arf > Enets arpa scarf
m metr > Nganasan metr meter
nojabr > njabri* november
spirt > Selkup pirta spirit
* with two epenthetic vowels
2.3.1.2. Prothesis
A particular type of epenthesis is when the vowel is inserted before the con-
sonant cluster at the beginning of the word; this is also known as prothesis.
That process can be observed in the Nenets, Nganasan and Selkup data. The
inserted vowel is a in Nenets and Nganasan and i in Selkup and in some cases
in Nganasan. Prothesis affects mostly sibilant + plosive clusters at the beginn-
ning of the word.
kola > Nenets askola/xaskola school

kola > Nganasan askol school
ranoj > ars j enj rye
c stul > istul chair
skamejka > Selkup iskamjka bench
c stol > istol table
2.3.1.3. C1-deletion
In general, vowel epenthesis seems to be a heavily prefered repair type in
loanword adaptation. Uffmann (2007) surveys case studies of loanword adap-
tation and he concludes that consonant deletion is a marginal phenomenon,
compared to epenthesis. Adding extra segments is less undesirable than delet-
ing segments from the word (Paradis and LaCharit 1997). C1-deletion affects
only Russian tautosyllabic clusters in the onset. It acts in each of the four
languages:
Nenets:
stakan > takan cup
kola > kola school
vtulka > tulka lead shot
Enets:
kola > kola school
st j eklo > t j eklo glass
Nganasan:
skamejka > kamejka bench
kola > kol school
Selkup
spirt > pirt spirit
- zdarovat j -s j a > tarowatt-qo to welcome
We have to mention that the same cluster can be affected by several different
processes, i.e. Russian word initial sibilant + plosive clusters can be borrowed
to Nenets with C1 -deletion or prothesis (see later discussion in 2.3.2).
138 Zsuzsa Vrnai
2.3.1.4. C2 -deletion
This is a very interesting repair process. In general, when truncation occurs it
eliminates the rst consonant of the cluster. We have only three pieces of data
for C2 -deletion. This repair strategy is active only in Selkup, affecting two
intersyllabic clusters and one onset cluster. The Russian complex sibilant +
plosive onset cluster is resolved by two types of truncation in Selkup:
zdarovat j -s j a > C1 -deletion: tarowatt-qo and C2 -deletion: sarowatt-qo. This
dichotomy is dialectal. Unfortunately, we have very few data; it would be use-
ful to get more examples of C2 -deletion.
Selkup
- zdarovat j -s j a > sarowatt-qo to welcome
kukla > kuka puppet
nuda > nua poverty
2.3.1.5. CV-metathesis
This adaptation strategy primarily affects initial onset clusters; it is not a com-
mon strategy, and its goal is to restructure the complex onset and to shift the
cluster to the syllable boundary:
CCVCV > CVCCV truba > turba
or
CVCCCVCV > CVCCVCCV kastrul j a > kosturl ja.
Enets
platok > poltok kerchief
truba > turba chimney, pipe
Selkup
krupa > kurpa cereals
kruptatka > kurtatka grits
kastrul j a > kosturl j a pot
2.3.1.6. Syncope
It is an extraordinary, unique phenomenon in the sample that works only in
Enets and produces (rather than removes) syllable contact clusters. Presumably
the aim of this strategy is to make a trisyllabic word bisyllabic, because bi-
syllabic structures are the most frequent ones in the Samoyedic languages.
Unfortunately we have very few data, only these two examples:
Enets
bumaga > bomga paper
malako > molka milk
2.3.1.7. Substitution
It is a little different from the other strategies. It is not a restructuring repair,
but it is a kind of assimilation where non-native segments are mapped onto
the phonetically closest ones that are well-formed in the native phonology. It
affects mostly contact clusters in intersyllabic position, but it can also affect
single segments.
Nenets doktor toxtur doctor
ag plak ag
Enets lavka lapka store
Nganasan kanfety kmpet candy
lavka lapku store
Selkup potta pota / pocta post ofce
rovna romna exactly
Substitution can act with restructuring repair (epenthesis) together:
Nenets gram xaram gramm

krupa xurupa cereals
drob j torob barrel
brezent persent y
kiga xyika book
klass xalas class
kravat j xorovat j bed
The following tables, 17 and 18, summarize these repair processes according
to languages and positions.
The most frequent strategy is epenthesis, acting almost in every position in
all the four languages. Substitution works mostly on contact clusters in inter-
syllabic position, but it affects not only clusters but also single segments. It is
noticeable that in the Selkup data the onset clusters are resolved by truncation
only as opposed to the other languages, where onset clusters are repaired by
deletion and epenthesis as well. These tables also show that the most frequent
strategies are epenthesis and substitution.
Clusters in word initial onset position form the site of the most frequent
repair processes, but these are completely missing in Samoyedic languages.
140 Zsuzsa Vrnai
Table 17. Repair processes in the four languages
selkup nganasan nenets enets

On Co C$C Ont Co C$C On Co C$C On Co C$C CVC
epenth. + + + + + + + + +
C1-del. + + + +
C2-del. + +
metath. + + +
subst. + + + + + + + +
sync. +
On = onset
Co = coda
Table 18. Repair processes according to position
onset coda C$C CVC

epenthesis 3 4 2
C1-deletion 4
C2-deletion 1 1
metathesis 2 1
substitution 4 4
syncope 1
Numbers = in how many languages the strategy

acts according to position
The transsyllabic clusters are the second most frequent place where repair pro-
cesses work. We have to mention that according to our data coda clusters are
resolved by epenthesis only.
2.3.2. Repair strategies and cluster types

The choice of epenthesis locations is language specic. The placement of the
vowel depends on what kind of consonants are in the cluster. Fleischhacker
(2001) presents a typological study of epenthesis in initial CC(C) clusters in
loanwords in many laguages, focusing on the question of whether the vowel
precedes the cluster (VCC) or breaks up the cluster (CVC). Generally in a
voiceless sibilant + stop cluster, a vowel tends to be inserted before the cluster
while in an obstruent + sonorant cluster, a vowel tends to be inserted into the
cluster. In Table 19 we can examine what kinds of clusters are affected by the
various repair processes according to position in Samoyedic languages.
Table 19. Cluster types according to repair strategies
epenthesis C1-deletion C2-deletion metathesis

syn- sub-
On C$C Co On On C$C On C$C Co cope stitution
sP sP sP sP sP sP sP sP
FP FP
OO PP
FF
AP
PL PL PL PL PL PL PL
PN PN PN
OS
FL
FN
LP LN LF LP NA
SO
Ls NP NP NF
SS
On = onset F = fricative
Co = coda L = liquid
O = obstruent N = nasal
S = sonorant P = plosive
s = sibilant
There is no SS repair and SO repair is very rare. The most frequent types of
cluster affected by repair mechanisms are sP and PL clusters: they are affected
by all deletions, epenthesis, metathesis, and substitution as well. They are the
most unacceptable sequences in all three positions.
The resulting order for repair strategies according to the most frequent
cluster types is:
sP: epenthesis > deletion > substitution > metathesis
PL: epenthesis > metathesis / substitution > deletion
FP: substitution > deletion > syncope
LP: epenthesis / syncope
PN: epenthesis / substitution
142 Zsuzsa Vrnai
We can observe a higher frequency of vowel insertion in the initial position of

the word before sibilant-plosive clusters. The sibilant + plosive clusters provoke
a variety of repair strategies: vowel epenthesis, with the inserted vowel located
either before or inside the cluster and consonant deletion. In word-initial clusters
consisting of a voiceless sibilant + stop, it is cross-linguistically more common
to insert a vowel before the rst consonant (prothesis), while in word-initial
clusters of an obstruent and sonorant, it is more common to place the vowel
between the consonants (anaptyxis) (Fleischhacker 2001). Fleischhacker
argues that the reason for this pattern is that epenthetic vowels are inserted
where they will cause the least perceptual difference between the foreign
word and the epenthesized adaptation. Epenthesis is driven by the goal of
maximal auditory similarity to input. Experimental studies show that ST
clusters are judged to be more similar to VST than SVT sequences, while TR
clusters are judged to be more similar to TVR than VTR sequences (Kang
2003: 221). This can be conrmed by our data to the extent that prothesis
can resolve voiceless sibilant + stop clusters, but not obstruent + sonorant
clusters.
The case of metathesis is especially interesting. Cross-linguistic survey
shows that instances of rhotic metathesis have a perceptual basis associated
with phonetic cues of the rhotic segment (Blevins and Garrett 2004). Our
data show that the most frequent clusters affected by metathesis are obstruent +
rhotic clusters (kr, tr, str) with only one exeption ( platok > poltok).
It is a very notable phenomenon that the same cluster can be affected by
several different processes: Russian word initial sibilant-plosive clusters can
be borrowed to Nenets with C1 -deletion or prothesis and as we mentioned
above Selkup sibilant + obtruent onset clusters are resolved by two types of
truncation. Perhaps this dichotomy is dialectal.
Nenets C1 -deletion > kola
kola school
Prothesis > askola
Selkup C1 -deletion > tarowatt-qo
- to welcome
C2 -deletion zdarovatj-sja > sarowatt-qo
3. Conclusion
The Samoyedic languages permit consonant clusters, but they are very restric-
tive in terms of complex edge components. For example, most of them do not
permit initial consonant clusters, or more than two consecutive consonants in
other positions, especially at the same side of the syllable boundary. There are
no complex edge components in any Samoyedic language in any position,

except for nal complex codas in Nenets, so two consonants are not allowed
in the onset position of a syllable at all. With respect to sequences of the four
languages under scrutiny here from the viewpoint of the sonority hierarchy, we
can conclude that well-formed clusters are more frequent than ill-formed ones.
As we have seen, these languages choose similar ways to adapt clusters in
early Russian loanwords: the most frequent repair strategy is epenthesis in all
four languages in any position. Syncope is a most special adaptation process
in Enets, which presumably helps to preserve a bisyllabic template most
common in these languages. Less accepted are branching onset clusters, least
of all in initial position. The most frequent types of cluster to be affected by
repair mechanisms are sP and PL clusters; they are affected by both types of
deletion, epenthesis, and metathesis and substitution as well.
Of course, a lot more research is needed to show the phonetic nature of
adaptation from Russian, and to test the correspondence between loanword
adaptation and perceptual assimilation. Further investigations are thus needed;
the problem is that it is rather difcult to elicit adaptation data in speech com-
munities at the level of bilingualism that present-day speakers of these four
languages exhibit.
144 Zsuzsa Vrnai
Appendix
Nenets: V-epenthesis
onset (#CC)
Russian Nenets
gr gram xaram gramm
kr krupa xurupa cereals
kr klass xalas class
PL
OS kl kravat j xorovat j bed
dr drob j torob barrel
br brezent persent y
PN k kiga xyika book
coda (CC#)
OS PL tr kilometr xilometra km
prothesis
askola
OO sP k kola school
xaskola
C1 -deletion
onset (#CC)
stakan takan cup
st
stol tol table
sP
OO kola kola school
k
kaf kap cupboard
FP vt vtulka tulka lead shot
substitution
C$C
FP ft kaftot ka xoptocka blouse
OO
PP kt doktor toxtur doctor
- fotograr- to take a
OS PL gr potokrapirujas j
ovat j photo
onset (#CC)
PL br brigada prigada brigade
OS
FL ag plak ag
Enets: V-epenthesis
onset (#CC)
Russian Enets
pla pala log
pl
plat j e palat j a dress
OS PL
br brevno beremno beam
kl j kl j ut kul j ut key
coda (CC#)
OS PL tr metr metra meter
SO LF rf arf arpa scarf
C1 -deletion
onset (#CC)
zdarova a doroba health,
zd
welcome
k kola kola school
sk skutno kuno boring
sp spasiba pasiba aj thanks
stakan takan cup
OO sP stol tol table

stul tul chair
st
stalia tolia capital
stolb tolb pillar
staro toro guard
st j st jeklo t jeklo glass
k kaf kap cupboard
146 Zsuzsa Vrnai
metathesis
onset (#CC)
pl platok poltok kerchief
OS PL chimney,
tr truba turba
pipe
syncope (CVC I C$C)
SO NP bumaga bomga paper
LP malako molka milk
substitution
C$C
vk I
OO FP lavka lapka store
pk
Nganasan: V-epenthesis
onset (#CC)
Russian Nganasan
brigada birigad brigade
br
br j uki buruk trousers
tr chimney,
truba turuba
pipe
PL
OS kl kladovka kolodovka chamber
krest j kirist chest
kr
krupa kyryh cereals
pl plan holan plan
PN k kiga kiig book
coda (CC#)
dr kedr kedr pine
OS PL tr metr metr meter
br nojabr njabri november
C$C
br fabrika hu abirik factory
PL tr natruska naturuska strainer
OS
kl uklad ukulat steel
PN d jm s j ed j moj s j ed j emi seventh
SS LN rm t j urma t j yryma prison
SO NP nt kontora kntor ofce
prothesis
k kola askol school
OO sP
st stul istul chair
SO Ls r ranoj ars j enj rye
C1 -deletion
onset (#CC)
sk skamejka kamejka bench
spasiba hu aiba thanks
sp
spravka horaapk certicate
stakan takan cup
OO sP st
stol tol table
zd zdarovat j - drbatu- to
sja dja welcome
k kola kol school
substitution
C$C
NF nf I
SO konfety kmpet candy
np
FP vk I
OO lavka lapku store
pk
148 Zsuzsa Vrnai
Selkup: V-epenthesis
onset (#CC)
Russian Selkup
OO sP gr gruz kurus cargo
coda (CC#)
OO sP sp spirt pirta spirit
OS PL tr metr metra meter
Coregonus
l jdj s j el j d j sel j t j a
SO LP sardinella
lk j olk olka silk
C$C
OO sP st j st j eklo t j ekla glass
prothesis CC#
sk skamejka iskamjka bench
OO sP
st stol istol table
C1 -deletion
onset (#CC)
spirt pirt spirit
to get
sp sputat j - mixed,
putaji-qo
sja to get
confused
zd zdarovat j - tarowatt- to
OO sP
sja qo welcome
st stakan takan cup
sk skamejka kamejka bench
sp spasiba paipo thanks
st staro toru guard
C2 -deletion
onset (#CC)
to
OO sP zd zdarovat j -s j a sarowatt-qo
welcome
C$C
OO sP d nuda nua poverty
OS PL kl kukla kuka puppet
metathesis
onset (#CC)
krupa kurpa cereals
OS PL kr
kruptatka kurt j atka grits
C$C
OO sP st kastrul j a kosturl j a pot
substitution
C$C
pota
AP tt potta post ofce
pot j ta
PP tk kadka katka tub
OO
FF fh savhoz sapko state farm
FP fk lavka lapky store
sP d nuda nuta poverty
PN dn ladna latno all right
OS sP sk natruska natru ka strainer
FN vn rovna romna exactly
SO NA nts palat j ene polotensa towel
150 Zsuzsa Vrnai
References
[All-Russian Census of Population 2002] 2002

]. www.perepis2002.ru
Blevins, Juliette
1995 The Syllable in Phonological Theory. In John A. Goldsmith (ed.)
Handbook of Phonological Theory, 206244. Cambridge Mass.:
Blackwell.
Blevins, Juliette and Andrew Garrett
2004 The Evolution of Metathesis. In Bruce Hayes, Robert Kirchner and
Donca Steriade (eds.) Phonetically Based Phonology, 117156.
Clements, George Nick
1990 The role of the sonority cycle in core syllabication. In John King-
ston and Mary E. Beckman (eds.), Papers in laboratory phonology
I: between the grammar and physics of speech, 283333. Cam-
Fishman, Joshua A.
1991 Reversing Language Shift. Theoretical and Empirical Foundations
of Assistance to Threatened Languages. Clevedon (England) and
Philadelphia: Multilingual Matters, 1991.
Fishman, Joshua A (ed)
2001 Can Threatened Languages Be Saved? Reversing Language Shift,
Revisited: A 21st Century Perspective. Clevedon: UK: Multilingual
Matters.
Fleischhacker, Heidi
2001 Cluster-dependent epenthesis asymmetries. In Adam Albright and
Taehong Cho (eds.) UCLA Working Papers in Linguistics 7, Papers
in Phonology 5, 71116.
Glukhij, Ja A.
1978 ( ) -
[Enets consonantism with experimental data (Bai
dialect)], , Leningrad.
Hajd, Pter
1982 Chrestomathia Samoiedica. Budapest: Tanknyvkiad (second edition).
Helimski, Eugen [, .]
1989 [About the morphophonology
of Nganasan], Paper read at IV. Phonologisches Symposium Uralischer
Sprachen, Hamburg.
Helimski, Eugen
1998 Nganasan. In Daniel Abondolo (ed.) The Uralic Languages, 480
Helimski, Eugen [, .]
, [North Selkup Dictionary] http://
www.uni-hamburg.de/ifuu/Arbeiten.html
Kang, Yoonjung
2003 Perceptual similarity in loanword adaptation: English postvocalic
word-nal stops in Korean. Phonology 20: 219273.
Kazakevich, Olga
2006 The functioning of the indigenous minority languages in the Yamalo
Nenets autonomous area, Turukhansk district of the Krasnoyarsk
territory and Evenki autonomous area. http://lingsib.iea.ras.ru/en/
round_table/papers/kazakevich1.shtml
Kenstowitz, Michael
1994 Phonology in Generative Grammar, Oxford: Blackwell.
Kenstowicz, Michael
2003 Salience and Similarity in Loanword Adaptation: a Case Study from
Fijian. To appear in Language Sciences.
Kosterkina, N. T., A. . Momde and T. Ju. danova [, . . . .
. . ]
2001 - - [Nganasan-
Russian and Russian-Nganasan Dictionary] Saint Petersburg.
Krigonogov, V. P.
1998
[Etnological processes in Central Siberian minorities], ,
.
Kuznetsova, A. I., E. A. Helimskij and E. V. Grushkina [... . .
X . . ]
1980 , I. [Col-
lection of Selkup language, Taz dialect] Moscow:
.
MacConnell, G. D. Mikhalchenko, V. (eds)
2003 . [Written
languages of the world, Languages of the Russian Federation].
Moscow: .
Maddieson, Ian
2008 Syllable Structure In: Martin Haspelmath, Matthew Dryer, David
Gil, and Bernard Comrie (eds.) The World Atlas of Language Struc-
tures Online. Munich: Max Planck Digital Library, chapter 12.
Available online at http://wals.info/feature/12
Murray, Robert, W. and Theo Vennemann
1983 Sound change and syllable structure in Germanic phonology. Lan-
guage 59(3): 514528.
Paradis, Carole and Darlene LaCharit
1997 Preservation and minimality in loanword adaptation. Journal of Lin-
guistics 33: 379430. Cambridge University Press.
Salminen, Tapani
1977 Tundra Nenets Inection. Mmoires de la Socit Finno-Ougrienne
227. Helsinki.
152 Zsuzsa Vrnai
Salminen, Tapani
1998 A morphological Dictionary of Tundra Nenets. Societatis Fenno-
Ugricae 26. Helsinki.
Sipos, Mria, Sipcz Katalin, Vrnai Zsuzsa, Wagner-Nagy Beta
2007 The Current Sociolinguistic Situation of some Uralic Peoples. Paper
read at 11th International Conference on Minority Languages
(ICMLXI). Pcs 56 July, 2007.
Sorokina, I. P. and D. S. Bolina [. . . . ]
2001 - -. [Enets-Russian and
Russian-Enets Dictionary]. -.
Tereenko, N. M. [, . .]
1966a [Selkup]. In Lytkin, V. (eds)
3. - [Languages
of the USSR, Volume 3. Finno-Ugric and Samoyedic languages]
M, .
1966b [Nenets]. In Lytkin, V. (eds)
3. - [Languages
of the USSR, Volume 3. Finno-Ugric and Samoyedic languages]
376395. scow: .
1966c [Nganasan]. In Lytkin, V. (eds)
3. - [Languages of
the USSR, Volume 3. Finno-Ugric and Samoyedic languages] 416
437. scow: .
1966d , [Enets]. In Lytkin, V. (eds)
3. - [Languages of the
USSR, Volume 3. Finno-Ugric and Samoyedic languages] 438
457. scow: .
1979 [Nganasan], Leningrad: .
1989 [Nenets-Russian Dictionary] scow:
.
Thomason, Sara G. and Terrence Kauffman
1988 Language Contact Creolization and Genetic Linguistics. Berkeley,
University of California Press.
Uffmann, Christian
2007 Vowel epenthesis in loanword adaptation, Tbingen, Max Niemeyer
Verlag.
Vrnai, Zsuzsa
2002 Hangtan [Phonology and Phonetics]. In Wagner-Nagy, Beta (ed.):
Chrestomathia Nganasanica. Studia Uralo Altaica Supplementum
10, 3370. Szeged.
Vrnai, Zsuzsa
2003 Valban mors nyelv-e a nganaszan? [Really Nganasan is mora
counting?] In Zoltn Molnr and Gbor Zaicz (eds): Permistica et
Uralica. FUP I, 268271. Piliscsaba.
Vrnai, Zsuzsa
2004 A nganaszan nyelv fonolgiai lersa [The phonological description
of Nganasan] Ph.D dissertation, Department of Uralistics, Etvs
Lornd University Budapest.
Vrnai, Zsuzsa
2005 Some problems of Nganasan phonology: Mora or Syllable? In Beta
Wagner-Nagy (ed.) Mikola konferencia, 113126. Szeged
Vrnai, Zsuzsa
Phonology, Phonotactics, Morphonology In Beta Wagner-Nagy (ed.):
Descriptive Grammar of Nganasan [manuscript].
Vennemann, Theo
1988 Preference laws for syllable structure and the explanation of sound
change: With special reference to German, Germanic, Italian, and
Latin, Berlin, Mouton de Gruyter.
Part II. Production: analysis and models
Articulatory coordination and the syllabication of
word initial consonant clusters in Italian
Anne Hermes, Martine Grice, Doris Mucke and

Henrik Niemann
Abstract
In this study we investigate the articulatory coordination of word initial consonant
clusters in Italian. We show that these clusters are generally coordinated in a similar
way to clusters in languages with complex syllable onsets, in that the timing of the
rightmost consonantal gesture in relation to the vocalic gesture is adjusted according
to the number of consonants in the cluster.
However, clusters containing a sibilant, /s/ or /z/, are an exception and show a
different coordination pattern altogether. Such clusters are referred to as having an
impure s, mainly as a result of allomorphy of indenite and denite articles (e.g. il
premio, but lo studente). In such cases, the sibilant does not affect the coordination of
the remaining consonants, indicating that it may not be part of the syllable onset.
1. Introduction
This study takes an articulatory approach to the syllabic parsing of word initial
clusters in Italian within the framework of Articulatory Phonology (Browman
and Goldstein 1988). In this model, the coordination patterns relating to con-
sonants and vowels have been shown to reect syllable structure in different
languages (Browman and Goldstein 2000; Marin and Pouplier 2010 for
American English, Goldstein et al. 2007 for Georgian and Tashlhiyt Berber,
Shaw et al. 2009 for Moroccan Arabic).
Articulatory Phonology models articulatory movements in terms of con-
sonantal and vocalic gestures. These are coupled in relation to each other in
specic ways, reecting the status of the respective consonants and vowels
within the syllable. In CV syllables, the C and V gestures are coupled in-phase,
indicating a simultaneous initiation of these two gestures. This reects the
onset-nucleus relation. In VC syllables, by contrast, the V and C gestures are
coupled in anti-phase relation, and are thus initiated sequentially. This reects
the nucleus-coda relation.
Crucially, syllables with complex onsets, CCV, are modelled as having
two competing coupling modes. On the one hand both C gestures are coupled
in-phase with the V gesture. On the other, the two C gestures are coupled in
anti-phase to each other, such that they do not start simultaneously, aiding
158 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
perceptual recoverability (Nam and Saltzman 2003). This is referred to as

competitive coupling. In CCV syllables, the gesture for the rightmost con-
sonant is closer to the vocalic gesture than in CV syllables, implying that the
rightmost C adjusts its timing to make room for the additional onset con-
sonant. This adjustment is referred to as a shift of the rightmost consonant.
In Italian, when a word initial sibilant is followed by a consonant, it is
referred to as impure s (Baretti 1832), indicating a special status of this
sibilant in a cluster compared to other clusters. The issue as to how /s/ is
syllabied in such clusters has so far remained unresolved.1 The present study
attempts to shed light on this question from a kinematic point of view.
First we start with an overview of the status of impure s in Italian (see
section 1.1.) and the link between articulatory coordination and syllable struc-
ture (see 1.2.). In the subsequent section, we summarise previous work on the
articulation of /s/+C clusters (see 1.3.). Section 2 provides details of the
experiment and section 3 presents the results. The summary and the discussion
of the ndings are dealt with in section 4, referring to coupling structure and
its phonological implementations.
1.1. Impure s in Italian

In Italian, consonant clusters with an impure s, /s/+C and /s/+CC, are
treated differently in morphology from a single /s/ or a consonant cluster with-
out a sibilant. For instance the masculine denite article alternates, depending
on the consonants at the beginning of the word (e.g. il sale, il premio but
lo studente, Davis 1990). Davis (1990) is specic about syllable structure,
arguing that the denite article il occurs when the following C belongs to the
syllable onset, while lo occurs when the following C is outside the onset. In
fact he claims that it is not only outside the onset but also outside the syllable
itself.
The syllabication of /s/ in clusters is an issue of much debate in a number
of languages. For Dutch, Fikkert (1994) refers to /s/ in /s/+C clusters as
extrasyllabic, while for English, Gierut (1999) treats it as an adjunct. Pan and
Snyder (2004) on the other hand, propose that /s/ is a coda of a preceding
syllable with an empty nucleus. According to Bertinetto (2004) the syllabica-
tion of Italian /s/+C clusters is still unresolved. In fact, Turchi and Bertinetto
(2000) and Bertinetto (2004) go as far as to say that the syllabication of
1. In what follows we refer to /s/ and /z/ as /s/ for the purpose of simplication. Voic-
ing is not distinctive in this position but rather conditioned by the voicing of the
following consonant.
Articulatory coordination and the syllabication of word initial consonants 159
/s/+C clusters is underdetermined for Italian, and may be speaker specic or

context dependent.
1.2. Articulatory coordination and syllable structure

Gestural coordination patterns have been proposed as a diagnostic for the
afliation of consonants to syllables based on the timing patterns between con-
sonants and vowels (Browman and Goldstein 1988, Honorof and Browman
1995, Browman and Goldstein 2000, Goldstein et al. 2007).
In Articulatory Phonology each gesture is associated with a planning oscil-
lator (or clock, see Browman and Goldstein 2000, Nam and Saltzman 2003).
The oscillators are coupled with each other in two basic modes: The most
stable intrinsic mode is the in-phase relation (= initiated simultaneously),
which is assumed for the coordination between consonantal gestures in
syllable onsets with the following vocalic gesture in CV). Another, less stable,
intrinsic mode is the anti-phase relation (= initiated sequentially). That mode
is primarily employed to model the coordination of the consonant gestures in
syllable codas (VC) with the preceding vowel.
Figure 1 schematises the relation between syllable structure and coupling
relations of articulatory gestures.
Thus in onsets, consonantal gestures are coupled in-phase with the vocalic
gesture (and therefore start simultaneously), whereas in codas, they are anti-
phase, i.e. they are sequenced with the vocalic gesture.
Figure 1. Coupling of consonants and vowels with respect to syllable structure;

in-phase relation = solid line; anti-phase relation = dotted line.
In complex onsets, consonants are in-phase with the vowel and at the same
time anti-phase with each other (Nam and Saltzman 2003, Goldstein et al.
2007). This competitive coupling in complex onsets is present on the surface
as the C-center effect (Browman and Goldstein 2000), where the mean of all
consonantal targets (C-center) is aligned at a stable timing point relative to
the vocalic target. Thus, the distance between the mean of targets for CC in
CCV and for CCC in CCCV is comparable to the midpoint for C in CV. As a
result of this, the rightmost consonant within the cluster is shifted further
towards the vowel with every added consonant. This rightmost shift has
recently been conrmed for Georgian (Goldstein et al. 2007). Other languages,
such as Tashlhiyt Berber (Goldstein et al. 2007) and Moroccan Arabic (Shaw
and Gafos 2008, Shaw et al. 2009), have been analysed as not allowing com-
plex onsets. In these latter studies the rightmost consonant in a cluster has a
stable timing with the vowel, regardless of the size of word initial clusters,
thus conrming the analysis whereby the rightmost consonant is the only one
included in the syllable onset. These studies indicate that it is possible to
recover signatures of syllable structure from the timing of articulatory move-
ments, especially from the gestural timing of the rightmost consonant in
clusters relative to the vocalic anchor.
1.3. Articulation of /s/+C clusters

There are many languages that have /s/+C clusters in word initial position. In
English the coordination patterns for word initial /s/+C and /s/+CC clusters
imply that /s/ forms part of a complex onset (see gure 2b), although such
onsets can incur a sonority violation (e.g. spayed). The analysis of /s/ as a
part of the onset in /s/+C clusters in English has recently been conrmed by
Marin and Pouplier (2010).
The original experiment by Browman & Goldstein (2000) involved triplets
such as sayed, spayed and splayed. They illustrated the effect of com-
petitive coupling (see gure 2a: in-phase relation of Cs with V and at the
same time anti-phase relation with each other) in terms of the C-center effect
(see gure 2b). Thus, it is the mean of the consonantal targets which is at a
constant distance from the vocalic target rather than the rightmost C, which
shifts, depending on the number of consonants in a cluster, such that the dis-
tance decreases with an increase in size of the cluster (see arrows in gure 2b).
For a more detailed discussion of C-center measures see Hermes et al. (2008).
In this study we concentrate on the distance between the rightmost con-
sonant and the vocalic target. This is the variable that is used in Goldstein
et al. (2007) to ascertain whether a sequence of consonants forms a complex
onset. To measure this systematically, the rightmost consonant has to be
Figure 2. Coupling graph (a) and schematised articulatory patterns (b) for onsets in
English, adapted from Saltzman et al. (2006).
kept constant (e.g. Berber: mun smun tsmun/) as opposed to English (e.g.
sayed spayed splayed) in earlier work (e.g. Browman and Goldstein 2000).
The rightmost C variable is hypothesised to decrease (rightward shift) com-
paring single onsets with non-sibilant clusters (where consonants are syllabi-
ed as part of the onset). For sibilant clusters, it is assumed that the rightmost
consonant within the cluster is not shifted, but remains at a stable timing point,
indicating that the sibilant is not part of the onset.
2. Method
2.1. Speakers
We recorded two native Italian speakers, one female speaker (MS) in her
mid-forties from Apulia in Southern Italy and one male speaker (AR) in his
mid-thirties from Trentino, in Northern Italy. Both speakers spent their rst
thirty years in their hometowns.
2.2. Speech Materials

The target words contain simplex onsets and clusters with and without sibi-
lants. All words had a feminine (-a) ending, ensuring that the denite article
la is constant (only the masculine article alternates between il and lo). For
the analysis we compared target words with a) C vs. CC, b) C vs. /s/+C and c)
CC vs. /s/+CC word initially, keeping the rightmost consonant constant. The
word list is shown in table 1.
The target words were embedded in the carrier sentence Per favore dimmi
la __ di nuovo (Please say the __ again), ensuring an alternation of high and
low vowels throughout the sequence.
Table 1. Wordlist
C CC /s/+CC
/rema/ (rheme) /prema/ (press) /sprema/ (squeeze)
/rima/ (rhyme) /prima/ (rst) /sprima/ (logatome)
/lina/ (proper name) /plina/ (logatome) /splina/ (logatome)
C /s/+C
/pina/ (proper name) /spina/ (thorn)
/la/ (line) /sla/ (s/he unthreads)
/vita/ (life) /svita/ (s/he unscrews)
A total of 300 sentences were recorded (15 target words 10 repetitions 2

speakers). The data is a subset of a more extensive corpus.
2.3. Recordings
The recordings took place at the If L Phonetics laboratory in Cologne. The
speech material was displayed on a computer monitor. Target words were
produced in pseudo-randomised order, each being spoken 10 times in total.
Speakers were instructed to speak at a rate they considered to be comfortable.
Acoustic and kinematic data were recorded simultaneously.
We recorded the acoustic signal with a DAT-recorder (TASCAM DA-P1)
using a condenser microphone (AKG C420 head set) and digitised at 44.1
kHz/16 bit.
The kinematic data was recorded with a 2D electromagnetic midsagittal
articulograph (Carstens AG100; 10 channels). We placed 2 sensors on upper
and lower lip and 3 sensors on the tongue: tongue tip, tongue blade and tongue
body (1cm, 2cm and 4cm behind the tongue tip). Two additional sensors on
the bridge of the nose and the upper gums served as references in order to
correct for head movements during the recordings (see Hoole 1996).
All kinematic data were sampled at 400 Hz, downsampled to 200 Hz and
smoothed with a low-pass lter at 40 Hz. For displaying and labelling data, all
acoustic and kinematic data were converted to SSFF-format to enable the data
to be analysed and annotated in the EMU Speech Database System (Cassidy &
Harrington 2001).
2.4. Labelling Procedure
All acoustic and articulatory landmarks were displayed and labelled by hand.
We labelled the onset and offset of the target word and its acoustically dened
segments. In the present study only the articulatory landmarks are reported on.
The remaining labels were placed in relation to the articulatory record. We
labelled movements in the vertical dimension, identifying minima and maxima
in the respective velocity trace (zero crossings). For vowel-to-vowel articula-
tion, we labelled the vocalic target for /i,e/. For consonants, we labelled
the maximum targets of the primary constrictors (Byrd 2000), whereas labial
consonants were identied by using the lip aperture index (LA, Byrd 2000).
Figure 3 illustrates how the landmarks are annotated for those measures.
Figure 3. Labelling scheme for test word /plina/ in Per favore dimmi la plina di nuovo.
From top to bottom: acoustic waveform, kinematic waveform for vertical
tongue-tip position, inter-lip distance and vertical tongue-body position.
Figure 4. Schematic diagram of articulatory measurements. Arrow indicating the dis-

tance between rightmost C and V target, comparing a) /lina/ and b) /plina/.
Figure 4 schematises the rightmost C variable for a simple onset, la lina

(see 4a), and for a complex onset, la plina (see 4b). We measured the dis-
tance between the target of the rightmost consonantal gesture and the target
of the vocalic gesture. When a consonant is added to an onset, the distance of
the rightmost C target relative to the V target is expected to decrease, due to
the assumed C-center effect for complex onsets (marked by the horizontal
arrow on a virtual time axis).
3. Results
We measured the distance of the rightmost C to the V target in 293 tokens for
both speakers; 7 utterances were discarded from the analysis, due to technical
problems. An overall-ANOVA with rightmost C as dependent variable re-
vealed signicance for the independent variable onset complexity (C, CC,
/s/+C, /s/+CC; p < 0.05) and for speaker (p < 0.01; speaker as random
factor). We therefore used one-way-ANOVAs for each speaker separately
including the dependent variable rightmost C and the independent variable
onset complexity.
3.1. Results: C vs. CC structure

The results for single C and CC clusters are presented for each speaker sepa-
rately (see table 2). Results of the ANOVAs are summarised in the last column
(-level is set at 0.05). Comparing C to CC clusters, we nd the expected
decrease of the rightmost consonant displaying a shift to make room for the
added consonant.
Table 2. Distance of rightmost C to V target in C and CC, both speakers, standard

deviation in parentheses.
Rightmost C to V (ms)
C CC F-value p-value
rema-prema 151 (11) 124 (6) 47.255 ***
MS rima-prima 166 (11) 117 (7) 141.699 ***
lina-plina 203 (12) 165 (21) 22.693 ***
rema-prema 189 (16) 140 (21) 27.279 ***
AR rima-prima 182 (20) 122 (23) 40.574 ***
lina-plina 227 (27) 155 (28) 33.812 ***
In all cases (p < 0.001) it is shown that the consonant is shifted consider-
ably towards the vowel (for speaker MS: /rema/ vs. /prema/ on average 27ms;
in /rima/ vs. /prima/ on average 49ms; in /lina/ vs. /plina/ on average 38ms;
for speaker AR: in /rema/ vs. /prema/ on average 49ms; in /rima/ vs. /prima/
on average 60ms; in /lina/ vs. /plina/ on average 72ms). In gure 5 the con-
siderable decrease of the rightmost C variable in C vs. CC structured target
words is shown graphically.
Figure 5. Rightmost C to V target in C vs. CC, speaker MS (above) and AR (below).

Line-up point is the vocalic target for /i,e/.
The graphs are constructed in analogy to gure 4 (schematic diagram of

distance between rightmost C and V target). Looking at the graph, the left
edge of each bar plot represents the mean target of the rightmost C in a cluster
or the only C (if it is a single consonant) measured as a distance in ms from
the vocalic target /i,e/ (synchronised at 0 in the graph).
The non-sibilant clusters show the expected decrease of the rightmost C
(comparing C vs. CC) to make room for the added consonant to the left,
which gives us an indication that these clusters form complex onsets in Italian.
We now turn to the results for clusters involving a sibilant.
3.2. Results: C vs. /s/+C structure

Table 3 provides the mean durations and standard deviations, separately for
speaker MS and AR comparing target words beginning with C and /s/+C (-
level is set at 0.05).
Table 3. Distance of rightmost C to V target in C vs. /s/+C, both speakers, standard

C /s/+C F-value p-value
pina-spina 241 (13) 243 (13) 0.143 n.s.
MS la-sla 189 (17) 184 (10) 0.617 n.s.
vita-svita 163 (12) 169 (15) 0.823 n.s.
pina-spina 267 (29) 271 (19) 0.053 n.s.
AR la-sla 197 (24) 187 (26) 0.835 n.s.
vita-svita 215 (16) 212 (9) 0.291 n.s.
For both speakers in all cases, there is no difference in the timing from
the rightmost C to the vocalic target, when comparing C to /s/+C clusters
(p > 0.05 n.s.). Although a sibilant is added to the beginning of the word,
the rightmost C is not adjusted relative to the vowel, i.e. latencies remain stable.
In gure 6 the results are presented graphically. Comparing the bars for
each word pair (C vs. /s/+C), we found no decrease of the distance of the
rightmost C to V target. The latencies remain the same. That was the case for
speaker MS in /pina/ ( 241ms) vs. /spina/ ( 243ms), /la/ ( 189ms) vs.
/sla/ ( 184ms) and /vita/ ( 164ms) vs. /svita/ ( 169ms). Furthermore, no

rightward shift was found for speaker AR in /pina/ ( 267ms) vs. /spina/ (
271ms), / la/ ( 197ms) vs. /sla/ ( 187ms) and /vita/ ( 215ms) vs. /svita/
( 212ms). Across these target words the rightmost c variable remains sta-
ble for both speakers (MS: on average 4ms, AR: on average 1ms), when the
sibilant is added.
Figure 6. Rightmost C to V target C vs. /s/+C, speaker MS (above) and AR (below).

Line-up point is the vocalic target for /i/.
3.3. Results: CC vs. /s/+CC structure

We now examine the cases where /s/ is added to a complex onset. This is the
case for /s/+CC clusters (see table 4). The rightmost C variable is stable for
speaker MS (/prema/ ( 124ms) vs. /sprema/ ( 128ms), /prima/ ( 117ms)
vs. /sprima/ ( 113ms), /plina/ ( 165ms) vs. /splina/ ( 158ms)). The same
systematic pattern was found in all cases for AR (/prema/ ( 140ms) vs.
/sprema/ ( 135ms), /prima/ ( 122ms) vs. /sprima/ ( 134ms), /plina/
( 155ms) vs. /splina/ ( 158ms)). In this pattern, the rightmost C does not
make room for the added sibilant.
Table 4. Distance of rightmost C to V target in C and /s/+CC, both speakers, standard

CC /s/+CC F-value p-value
prema-sprema 124 (6) 128 (12) 0.835 n.s.
MS prima-sprima 117 (7) 113 (9) 1.405 n.s.
plina-splina 165 (21) 158 (15) 2.047 n.s.
prema-sprema 140 (21) 135 (13) 0.424 n.s.
AR prima-sprima 122 (23) 134 (21) 1.455 n.s.
plina-splina 155 (28) 158 (15) 0.067 n.s.
Compared to the /s/+C clusters, we nd similar articulatory timing pattern.

Whenever the impure s is added to either a single C or a complex onset the
distance of the rightmost C target to the V target does not decrease to make
room for the added sibilant, as it would have been expected for initial clusters
where all consonants are part of the complex onset. This stable timing of
the rightmost C variable, comparing CC with /s/+CC, is displayed in the
graphs in gure 7.
Figure 7. Rightmost C to V target in CC vs. /s/+CC, speaker MS (above) and AR

(below). Line-up point is the vocalic target for /i,e/.
4. Discussion
These results on articulatory coordination in Italian provide evidence for com-
plex onsets in Italian (CC clusters). In the analysis of the target words C and
CC, we found a decrease in the distance between the rightmost C target and
the V target. The second C target in the cluster is shifted towards the vowel.
This supports the hypothesis of an underlying competitive coupling structure
for complex onsets. The articulatory coordination of the rightmost C in a com-

plex onset is adjusted according to the number of consonants.
The coordination pattern in /s/+C clusters is different. Here there is no
decrease in the distance between the rightmost C target and the V target. These
results provide additional, articulatory motivation for assigning a special status
to the Italian impure s in terms of its syllabic constituency.
Table 5 summarises the results for the distance between the rightmost C
and the V target in the investigated words.
Table 5. Summary of shift for rightmost C towards V for both speakers.
Structure Shift of rightmost C Cases

C vs. CC YES all
C vs. /s/+C NO all
CC vs. /s/+CC NO all
These results show that /s/ does not exhibit the articulatory timing patterns
required for membership of the syllable onset, in that the rightmost C target is
at a constant distance from the V target. This is true for all analysed target
words containing an impure s for both speakers. In other words, adding the
sibilant to the onset of a word does not affect the timing of the other con-
sonants relative to the vocalic target. Thus, there is no evidence for an under-
lying competitive coupling structure between /s/ and the other consonants.
4.1. Coupling structure

From an articulatory point of view, the articulatory analysis reects different
coupling structures proposed for word initial clusters in Italian.
We found the underlying competitive coupling of Italian non-sibilant con-
sonants in word initial position, resulting in a decrease of the distance from the
rightmost C to the V target, reecting a C-center like coordination (see gure
8a; Hermes et al. 2008). This coupling is schematised in gure 8a, where all
consonants are coupled in-phase with the vowel (ideally initiated simultane-
ously), but the consonants are anti-phase with each other (ideally initiated
sequentially).
By contrast, sibilants in clusters did not show a C-center like coordination.
This is illustrated in gure 8b. In the sequence /s/+C, the rightmost consonant
is not shifted rightwards when compared to a simple C onset (see gure 8b).
Figure 8. Schematised articulatory pattern and coupling graphs for C vs. CC cluster
(a), C vs. /s/+C clusters (b) and C vs. CC vs. /s/+CC (c) clusters in Italian.
The same holds for /s/+CC compared to CC (see Figure 8c). This implies that
impure s does not participate in the competitive coupling structures.
4.2. Phonological implications

This is the rst study to show that one and the same language can have dif-
ferent gestural timing patterns in word initial consonant clusters, depending
on the identity of the consonants concerned.
1) We have provided evidence for C-center coordination in word initial
non-sibilant clusters in Italian, supporting an analysis of these clusters as
complex onsets.
2) In the case of /s/-clusters we have shown that the sibilant does not co-
ordinate in the same way as other consonants. This is the case both in
three-consonant clusters, /s/+CC (e.g. /spr/), as well as in two-consonant
clusters, /s/+C (e.g. /sp/). This coordination pattern supports an analysis
of /s/ as being outside the onset of the syllable, or even outside the syllable
itself.
3) Specically for Italian, morphological alternations before words begin-
ning with a /s/-clusters have fuelled a debate as to their syllabic structure.
Our results add one more piece of evidence that they are not part of a
complex onset on this language.
4) Comparing our results for Italian to those for English, we can show that it
is not /s/-clusters per se that have this coordination, since sibilants are
timed in the same way as other consonants in the latter. Thus /s/-clusters
cannot be seen as having a universal gestural coordination.
Acknowledgements
We would like to thank Hosung Nam (Haskins Laboratories) for the fruitful
discussion on coupling structures for word initial consonant clusters in Italian
with and without impure s.
References
Baretti, G.
1832 English and Italian Dictionary. Part the Second. Florence: Cardinal
Printing Ofce.
Bertinetto, P.M.
2004 On the undecidable syllabication of /sC/ clusters in Italian: Con-
verging experimental evidence. Italian Journal of Linguistics/Rivista
di Linguistica, 16, 349372.
Browman, C.P. and Goldstein, L.

1988 Some Notes on Syllable Structure in Articulatory Phonology. Pho-
netica 45, 140155.
Browman, C. and Goldstein, L.
2000 Competing constraints on intergestural coordination and self-organi-
zation of phonological structures. Les Cahiers de lICP, Bulletin de
la Communication Parle, 2534.
Byrd, D.
2000 Articulatory vowel lengthening and coordination at phrasal junctures.
Phonetica 57, 316.
Cassidy, S. and Harrington, J.
2001 Multi-level annotation in the Emu speech database management sys-
tem. Speech Communication, 33, 6177.
Davis, S.
1990 Italian Onset Structure and the Distribution of il and lo. Linguistics,
28, 4355.
Fikkert, P.
1994 On the acquisition of prosodic structure. Ph.D. Dissertation, HIL
dissertations 6, Leiden University. The Hague: Holland Academic
Graphics.
Gierut, J.A.
1999 Syllable onsets: clusters and adjuncts in acquisition. Journal of
Speech, Language, and Hearing Research, 42, 708726.
Goldstein, L. Chitoran, I., and Selkirk, E.
2007 Syllable structure as coupled oscillator modes: Evidence from
Georgian vs. Tashlhiyt Berber. Proceedings of the 16th International
Congress of Phonetic Sciences, Saarbrcken, Germany, 241244.
Hermes, A., Grice, M., Mcke, D. and Niemann, H.
2008 Articulatory Indicators of Syllable Afliation in Word Initial Con-
sonant Clusters in Italian. Proceedings of the 8th Internatinal Seminar
on Speech Production, Strasbourg, France, 433436.
Hoole, P. and Khnert, B.
1996 Tongue-jaw coordination in German vowel production. Proceedings
of the 1st ESCA tutorial and research workshop on Speech Produc-
tion Modelling/4th International Seminar on Speech Production,
Autrans, 1996, 97100.
Honorof, D.N. & Browman, C.P.
1995 The center or edge: how are consonant clusters organized with
respect to the vowel? In K. Elenius & P. Branderup (eds.). Proceed-
ings of the 13th International Congress of Phonetic Sciences, Stock-
holm, Sweden, 552555.
Marin, S. and Pouplier, M.
2010 Temporal organization of complex onsets and codas in American
English: Testing the predictions of a gestural coupling model. Motor
Control 14(3), 380407.
Nam, H. and Saltzman, E.

2003 A competitive, coupled oscillator of syllable structure. Proceedings
of the 15th International Congress of Phonetic Sciences, Barcelona,
Spain, 22532256.
Pan, N. and Snyder, W.
2004 Acquisition of /s/-initial clusters: A parametric approach. In Pro-
ceedings of the 28th BUCLD. Boston, 436446.
Saltzman, E., Nam H., Goldstein, L. and Byrd, D.
2006 The distinctions between state, parameter and graph dynamics in
sensorimotor control and coordination. In M. L. Latash and F.
Lestienne, (eds.). Motor Control and Learning. New York: Springer,
6373.
Shaw, J. and Gafos, A.
2008 C-Center and Syllabication in Moroccan Arabic. Poster presenta-
tion: CUNY Conference on the Syllable, New York, January, 1719.
Shaw, J., Gafos, A., Hoole, P. and Zeroual, C.
2009 Syllabication in Moroccan Arabic: evidence from patterns of tem-
poral stability. Phonology 26, 187215.
Turchi, L. and Bertinetto, P.M.
2000 La durata vocalica di fronte ai nessi /sC/: unindagine su sogetti
pisani. Studi Italiani di Linguisitca Teorica e Apllicata 29, 389421.
A gestural model of the temporal organization of
vowel clusters in Romanian
Stefania Marin and Louis Goldstein
Abstract
This study proposes a task-dynamic gestural model of the Romanian hiatus sequence
/e.a/ and of diphthong /ea/, starting from the hypothesis that the temporal organization
of hetero- and tauto-syllabic vowel clusters can be modeled in terms of particular
coupling relations. For modeling hiatus /e.a/, stimuli were created with the oscillators
for vowels /e/ and /a/ coupled anti-phase (180-degrees) or on different cycles (360-
degrees), resulting in their sequential production. These stimuli were classied percep-
tually by Romanian listeners as hiatus sequences. For modeling stressed diphthong /ea/
and its alternation with unstressed vowel /e/, stimuli were created with the oscillators
for vowels /e/ and /a/ coupled in-phase (0-degree), resulting in their synchronous pro-
duction, and with additional manipulations of dynamic parameters, intended to model
stress effects. The perceptual results showed that vowels /e/ and /a/ synchronously co-
ordinated were perceived as vowel /e/, when all dynamical parameters were kept
constant, and that a diphthong percept was triggered when the blending weight for /a/
was greater than for /e/, causing vowel /a/ to achieve its target closer to its specica-
tion, to the detriment of vowel /e/. An acoustic analysis further showed a similarity
between the modeled stimuli and corresponding stimuli produced by Romanian native
speakers.
1. Introduction
Languages often distinguish between vowel sequences or clusters that are

parsed into different syllables (hetero-syllabic hiatus sequences, shown sche-
matically in diagram 1a), and vowel sequences that are assigned to a single
syllable (tauto-syllabic diphthongs, 1bc). Many languages distinguish between
hetero- and tauto-syllabic structures by treating one of the vowels in the tauto-
syllabic sequence as a consonantal glide, and by structurally assigning it to the
onset or coda of the syllable, rather than its nucleus (1b). Other languages treat
such tauto-syllabic sequences as syllable nuclei or true vocalic clusters (1c),
with the difference between hiatus sequences and such diphthongs being
whether they are assigned to the same or different nuclei (1a vs. 1c). How the
structural conguration of a nuclear diphthong (1c) is to be timed during
speech production in a manner that would distinguish it from hiatus structures,
178 Stefania Marin and Louis Goldstein
and secondarily also from non-nuclear diphthongs, is a question that has

received little consideration. This paper seeks to address it by using a model
that formally and systematically links linguistically signicant structures (such
as syllables and within-syllable constituents) to their temporal implementation
at the production level. Specically, it is proposed that the structural differences
in (1) are captured by distinct temporal relations holding between these different
vowel sequences.
(1) a. b. c.
The test-case language selected for the temporal modeling of the structural
distinctions illustrated in (1) is Romanian, with extensions to other cross-
linguistic instances remaining a subject for future examination. Romanian
provides an interesting case for investigating this question in that nucleus
diphthong /ea/ contrasts both with the hiatus sequence /e.a/ and with the non-
nuclear diphthong /ja/ (cf. Chitoran 2001, for a language description and a
detailed discussion of these diphthongs phonotactics). Furthermore, the nuclear
diphthongs have a mid quality, which makes them quite distinguishable from
non-nuclear diphthongs (Chitoran 2002).
The nuclear diphthong participates in a stress-conditioned alternation, shown
in (2). An interesting experimental nding was that alternating /e/ in (2b) was
realized acoustically more centralized than non-derived /e/ (3) (Marin 2005,
accepted). This difference was observed both at vowel onset and at mid-point.
At the same time, alternating /e/ was shown not to differ qualitatively from the
onset part of diphthong /ea/, while non-alternating /e/ and the onset part of the
diphthong differed signicantly. At mid-point, the diphthong differed from
both alternating and non-alternating /e/, exhibiting more centralized formant
patterns than those of either alternating or non-alternating /e/. The qualitative
difference between the diphthong and non-alternating /e/ is not surprising
assuming a bi-vocalic representation of diphthongs, such as the one in (1c):
the difference between diphthong-onset and non-alternating /e/ could be
explained as a co-articulation effect of the diphthongs offset part (vowel /a/)
on its onset, an effect naturally absent in the case of non-alternating /e/.
Following this reasoning, the absence of an acoustic difference between alter-
nating /e/ and the diphthongs onset suggested that their properties at onset
were similar namely in both cases their beginning consisted of vowel /e/
being co-produced with vowel /a/. Alternating /e/s acoustic properties could
A gestural model of the temporal organization of vowel clusters in Romanian 179
therefore be the result of vowels /e/ and /a/ being co-produced with each
other, which could explain both the difference between alternating and non-
alternating /e/, and the lack of difference between alternating /e/ and diphthong
/ea/s onset.
(2) Alternating roots:
a. Diphthong: ['sea.ra] the evening
b. Alternating /e/: [se.'ra.ta] the evening party
(3) Non-alternating roots:
a. ['se.ra] the greenhouse
b. [se.ri.'ti.ka] the greenhouse-Diminutive'
Starting from this hypothesis, the current papers aim is to explore the
extent to which the planning and execution of Romanian diphthong /ea/ (and
potentially such like units cross-linguistically) can be modeled in a way that
(a) is consistent with the kind of compositional phonological representation
shown in (1c), while at the same time being distinct from hetero-syllabic /e.a/,
(b) is capable of producing the acoustic patterns observed, and (c) can account
in a principled way for the alternation between diphthong /ea/ and alternating /e/.
In a preliminary gestural modeling study (Marin 2007), in which task-
dynamic modeled stimuli were categorized by native speakers, an /e/ vowel
percept was obtained when the constrictions/activation intervals for vowels
/e/ and /a/ were fully overlapped, and a diphthong percept when the activation
intervals for vowels /e/ and /a/ were overlapped for approximately 90% of
their movement. When the activation intervals for /e/ and /a/ did not overlap
at all, the resulting percept was hiatus /e.a/. These previous results suggested
that both alternating /e/ and the diphthong could be modeled as vowels /e/ and
/a/ whose constriction movements were (almost) fully overlapped, with the
difference that in the presence of stress, vowel /a/ would presumably be realized
slightly longer and spatially stronger, and hence not fully blended with the
movement for vowel /e/. In contrast, the hiatus sequence /e.a/ could be modeled
as two vowels fully sequential (rather than overlapped).
These temporal relations as a function of syllable organization can be
formalized in terms of specic phasing relations (or coupling modes) between
the respective vowels. For many types of skilled actions, it has been shown
that two coupling modes in-phase, and anti-phase, require no learning and
can be stably maintained (Haken et al. 1985; Turvey 1990). If the planning
clocks responsible for triggering two actions are coupled in-phase, the actions
will be triggered synchronously; if the two clocks are coupled anti-phase, one
action will be triggered after the other, with a lag equal to half the clock
period; nally, if two actions are coupled in-phase but on different cycles (i.e.
they are 360-degrees-coupled), their onsets will lag by a complete clock
cycle, and the two actions will be triggered fully sequentially. It has been
hypothesized that speech employs these intrinsic coupling modes as well, and
that syllable structure could be understood in terms of these specic modes of
coordination (Browman and Goldstein 2000; Byrd et al. 2009; Goldstein et al.
2006; Krakow 1999; Marin and Pouplier 2010; Nam 2007; Nam et al. 2009).
This approach provides a principled and economical way of understanding
temporal organization in speech production, by making use of coupling rela-
tions between planning oscillators, assumed to play a role not only in speech
but in coordinated human action in general. Thus, while in the study discussed
above (Marin 2007) the distinction between alternating /e/, diphthong /ea/ and
hiatus /e.a/ was achieved informally by manipulating temporal overlap, the
present study aims to model these linguistic categories as arising from lawful
consequences of specic inter-gestural coupling modes.
Specically, we hypothesize that the temporal pattern exhibited by hiatus
sequences with little to no overlap between the vowel activations could be
modeled as a 360-degree coupling such that movement for vowel /a/ begins
roughly when movement for vowel /e/ ends. As for diphthong /ea/ and its
stress-conditioned alternation with /e/, it is hypothesized that the overlap
pattern shown previously (Marin 2007) to result in the percept of /e/ or /ea/
can be modeled as the result of in-phase coupling between the two vowel
actions. Whether this coordination mode results in the percept of a vowel or
of a diphthong should be determined by additional dynamic parameters,
whose exact nature is the experimental focus of this paper. This analysis
entails that the hiatus and the diphthong are compositionally similar, but dis-
tinguishable in terms of the coupling relations, and hence specic timing,
holding between their composing vowel actions. It also entails that the alterna-
tion between diphthong /ea/ and alternating /e/ is not structural, but the result
of different dynamical parameters governing the same vowel actions. To test
this analysis, the current study presents a gestural modeling of diphthong /ea/,
its alternation with vowel /e/, and its contrast with hiatus /e.a/. The modeled
stimuli are evaluated both perceptually (Experiments 1 and 2), and by com-
paring the acoustic properties of modeled stimuli with those of corresponding
stimuli produced by native speakers (Experiment 3).
Non-nuclear diphthongs (1b) are not considered in this paper: as onset-
nucleus or nucleus-coda structures (cf. Chitoran and Hualde 2007) they are
assumed to be organized temporally as onsets/codas with a consonantal glide.
2. Description of the model
The computational model used in the current study the Task-Dynamic Appli-
cation (TADA), is a gesture-based system developed at Haskins Laboratories
to test hypotheses formulated within dynamical speech production models,
such as Articulatory Phonology (Browman and Goldstein 1990; Browman et
al. 1984; Goldstein et al. 2006; Nam et al. n.d.; Saltzman and Munhall 1989).
TADA generates speech outputs on the basis of dynamical specications of
articulatory gestures (as speech action units) and the coupling relations among
their clocks, which serve as information for computing a gestural score with
precise activation intervals for each gesture. Articulator movement then results
from imposing a set of dynamical controls on the articulators. The resulting
articulator trajectories are in turn used to compute vocal tract shapes, area
functions, and ultimately, sound via the pseudo-articulatory synthesizer HLSyn
(Hanson and Stevens 2002).
3. Experiment 1
3.1. Preliminary considerations on modeling word stress

As the current TADA implementation does not automatically model stress, and
because the alternation in Romanian between diphthong /ea/ and alternating /e/
is stress-conditioned, several possible stress-relevant parameters were explored.
Throughout the paper, we assume that vowel /a/ bears stress in the diphthong,
as its most sonorous element (cf. Selkirk 1984). Alternatively, our argument
remains similar conceptually if it is assumed that both vowels (and indeed
the entire syllable) are stress-bearing, but that vowel /a/, as more sonorous, is
affected more articulatorily by stress.
Two established acoustic correlates of word stress are vowel duration, and
spectrum balance (cf. Sluijter and van Heuven 1996, for a review). These
acoustic differences between stressed and unstressed vowels have been assumed
to be caused by increased physiological effort due to stress (Sluijter and van
Heuven 1996: 2473). This view is consistent with evidence available from
physiological studies showing that stressed vowels are produced with overall
greater articulatory excursions (cf. for example Beckman and Cohen 2000;
Beckman and Edwards 1994; Harrington et al. 1995; de Jong 1995; Moos-
hammer and Fuchs 2002), and that they are generally less affected by coarticu-
lation (cf. Cho 2004; Fowler 1981). For our purposes, physiological effort
was interpreted and modeled as articulatory strengthening of a given gesture
relative to other gestures, by allowing the stressed gesture to take over a
shared articulator, and achieve its constriction closer to its underlying target, to
the detriment of unstressed gestures (cf. also the insights in Lindblom 1963,
and more recently de Jongs 1995 model of stress as hyperarticulation).
Because F0 movement is controlled primarily by placement of prosodic
pitch accents rather than by lexical stress, per se (cf. Beckman and Edwards
1994; Sluijter and van Heuven 1996), it was not considered here. Vowel quality
was also not assumed to be a relevant cue for encoding stress in Romanian,
given previous impressionistic descriptions and empirical evidence showing
that vowel /e/ in Romanian does not differ qualitatively as a result of stress
(Marin accepted). On the basis of these considerations, three parameters were
tested for modeling stress effects: activation interval of the vowel gestures
(affecting the vowels relative duration), relative blending weight of the two
vowel gestures (determining the vowels relative articulatory strength), and
presence of a prosodic gesture (Byrd and Saltzman 2003) slowing the time
course of speech production, and resulting in lengthening of the affected con-
striction. Each of these parameters will now be considered in more detail.
The activation interval of the two relevant vowel gestures determines the
time between activation onset and offset of each vowel. The coupled oscillator
model species the phase at which a gesture is activated relative to another,
while de-activation by default occurs at some regular phase of the gestures
own clock (340-degree for vowels). Thus two vowels coupled in-phase are
synchronous at activation onset, and by default (i.e. determined by their own
internal clocks) also at offset. Activation offset was manipulated so that for
some stimuli offset of /e/ occurred earlier than offset of /a/ resulting in a rela-
tively shorter duration of /e/, mirroring the fact that in Romanian (and other
languages, cf. Lindblom 1963) /e/ is slightly shorter than /a/ (Burileanu 2002).
While differential duration of low vs. mid/high vowels is not per se a stress-
related property of these vowels, it was assumed that stress could affect the
movement, and hence the duration of an intrinsically longer low vowel more
than that of a shorter one. It must be noted that without a manipulation of
activation offset, only very small intrinsic vowel duration differences would
emerge automatically from the current implementation of the model.
A second manipulation was the relative blending weight of the two vowels.
In the prosodic component of TADA currently under development, stress is
modeled, in part, by means of a spatial modulation gesture (so-called -
gesture) which serves to make stressed gestures more extreme, achieving con-
strictions closer to their underlying target values, in comparison to unstressed
gestures which may show more target undershoot (Saltzman et al. 2008). In
the currently available version of TADA in which -gestures are not yet im-
plemented, their effect can be approximated for the case of two synchronous
gestures by manipulating their relative blending weights (cf. Saltzman and

Munhall 1989). This weight determines how the target parameters are to be
averaged in the case when their time-overlapping actions control the same
vocal tract variables (for vowels /e/ and /a/, Tongue Body Constriction Loca-
tion and Degree). Thus, the stressed gesture can be given a higher blending
weight than the unstressed one, so that its target specication would affect
more the nal blended state, much as would occur if the blending weights
were equal but an enhancing -gesture were associated with the stressed
vowel. For our experiment, we manipulated blending weight so that we had
stimuli with both equal and unequal blending weights for /e/ and /a/.1
Finally, a third manipulation had to do with the presence or absence of
a prosodic gesture (-gesture) which warps the underlying clock speed of
all gestures produced concurrently with it, hence slowing their movement.
-gestures have been previously used for modeling lengthening at various pro-
sodic boundaries (Byrd 2000; Byrd and Saltzman 2003), and this was assumed
to be one possible way of modeling lengthening under stress as well.
3.2. Method
3.2.1. Participants
Twelve native Romanians, nave to the purpose of the experiment, and with
no reported speech, hearing or language decits participated in this auditory
perception task.
3.2.2. Materials and procedure

We hypothesized that the temporal organization of hiatus /e.a/ could be
modeled with vowels /e/ and /a/ coupled 360-degree, while diphthong /ea/
and its alternation with vowel /e/ could be characterized by in-phase coupling.
To test this hypothesis, we modeled stimuli with the planning oscillators for
vowels /e/ and /a/ coupled in-phase (0-degree), resulting in their synchronous
production, coupled anti-phase (180-degree), resulting in partly sequential
relative timing of their activation intervals, and nally coupled 360-degree,
resulting in completely sequential timing of their activation intervals. Stimuli
with single vowels /e/ and /a/ were also modeled. The gestures for /e/ and /a/
1. The same (weight-) averaging mechanism is at work regardless of the identity of

the gestures to be blended (a simple averaging for gestures with equal weights, a
weighted averaging otherwise). The acoustic and perceptual consequences of such
blending are an empirical question to be addressed experimentally.
were modeled throughout using the default TADA specications for vowels
[] and [a] respectively, matching the phonetic characteristics of Romanian /e/
and /a/ (cf. Chitoran 2001). All the stimuli had an initial and nal labial stop
/b/ anking the relevant vowels (/b_b/).
In addition to the coupling relations between vowels /e/ and /a/, three addi-
tional parameters, assumed to play a role in modeling stress (and hence the
stress-conditioned alternation /'ea/-/e/), were manipulated. One manipulation
was changing vowel activation offset for some items so that offset of /e/
occurred earlier than offset of /a/; thus, for some stimuli, vowel de-activation
occurred at 340 degrees on the cycle of either /e/ or /a/, while other stimuli
were created with earlier de-activation of /e/, at 300 or 270 degrees, resulting
in a shorter activation interval. De-activation for /a/ was kept constant at 340
degrees. A second manipulation was relative blending weight of the two vowels
targets: for some stimuli both vowels /e/ and /a/ had the same blending weight
(i.e. a blending weight ratio BWR of 1), while for the other stimuli /a/, as the
vowel more affected by stress, had twice the blending weight of /e/ (resulting
in a BWR of 2). A third manipulation was presence or absence of a prosodic
gesture on a stimulus. When present, the -gesture was active for the entire
vowels activation duration, and its strength was at throughout (when the
two vowels had different activation durations, the -gestures activation coin-
cided with the longer vowels one). Tables 1 and 2 provide a full description
of the modeled stimuli.
Acoustic outputs with a 11025 Hz sampling frequency were generated on
the basis of these articulatory congurations, and they were classied on the
basis of auditory perception by 12 listeners (ve male). The experiment was
carried out in a quiet room and the participants were tted with headphones.
DMDX software (K. Forster and J. Forster 2003) was used for stimulus pre-
sentation and response recording. The stimuli included the bilabial closures
anking the vowel interval of interest. A forced-choice identication design
was used, in which the listeners had to decide, by pressing an appropriately
labeled computer key, whether the item heard was a) part of two syllables
(BE AB), or contained b) diphthong /ea/ (BEAB), c) vowel /e/ (BEB), or d)
vowel /a/ (BAB). None of the choices were real words in Romanian. In the
written instructions, the participants were presented with real word examples
of the categories and were told that they would hear fragments of computer
synthesized words containing those categories in the context /b_b/. The pro-
gram advanced to the next stimulus as soon as a response key was pressed or
after 6.1s. Ten repetitions of each stimulus were included in the experiment,
presented in random order.
Table 1. Description of stimuli with single vowel gestures, and with two vowel
gestures coupled anti-phase or 360-degree used in Experiment 1.
Stimulus Vowel to vowel coupling BWR [a]/[e] -gesture applied

a n/a n/a no
e n/a n/a no
ea180 180-degree 1 no
ea180_ W2 180-degree 2 no
ea180_ 180-degree 1 yes
ea180_ W2_ 180-degree 2 yes
ea360_ W2 360-degree 2 no
ea360_ W2_ 360-degree 2 yes
Table 2. Description of stimuli modeled with two vowel gestures coupled in-phase
used in Experiment 1.
Stimulus Activation offset for /e/ BWR [a]/[e] -gesture applied

ea 340-degree 1 no
ea_W2 340-degree 2 no
ea30_W2 300-degree 2 no
ea27_W2 270-degree 2 no
ea_ 340-degree 1 yes
ea_W2_ 340-degree 2 yes
ea30_W2_ 300-degree 2 yes
ea27_W2_ 270-degree 2 yes
3.3. Results
The perceptual results averaged across listeners showed that single vowel
stimuli were perceived as either vowels /e/ or /a/ over 90% of the time (Figure
1a). Stimuli with vowels timed non-synchronously were perceived as hiatus
stimuli more than 85% of the time, with individual listeners ranging between
70100% hiatus responses to ea180 stimuli and between 80%100% hiatus
responses to ea360 stimuli.
For the stimuli with vowels coupled in-phase (Figure 1b), the identication
patterns showed that neither different activation intervals of the two vowels
nor presence of a -gesture alone (nor a combination of the two) made a
difference in how they were perceived. Stimuli with these manipulations alone
were overall identied as vowel /e/ 90% of the time, similar to the identica-
tion pattern of the stimulus with no manipulation (stimulus ea). As to the
blending weight parameter, there was a trend towards increasingly identifying
as diphthongs those stimuli for which /a/ had greater blending weight. Thus,
W2-stimuli were identied as a diphthong on average 3540% of the time,
with the additional presence of a -gesture slightly enhancing this effect. Indi-
vidual participant patterns, shown in Table 3, indicated that differential blend-
ing weight, independent of the other manipulations, triggered a diphthong
response at a 50% or greater level for about half of the participants, while it
did not trigger a diphthong response for the other half of the participants. The
perceptual pattern indicated therefore that greater blending weight for vowel
/a/ was the manipulation most inuencing a diphthong percept (albeit not for
all listeners), independent of vowel activation duration or -gesture.
To quantify these observations, we carried out a generalized linear mixed
model analysis with the individual (non-averaged) classication responses as
the dependent variable (two levels: diphthong vs. any other response), stimulus
as a xed factor, and participant as a random factor. This analysis showed that
stimuli with a blending weight ratio of 2 were classied as a diphthong signi-
cantly more than the base ea stimulus (cf. Table 4), conrming the trend in
diphthong response observed on the averaged data.
The shift from a vowel to a diphthong percept for those listeners exhibiting
it was not due to stimulus duration. Stimuli with a -gesture were the longest,
but this duration difference alone did not trigger a predominant diphthong
response (cf. the stimuli represented by circles in Figure 2). While the stimuli
with a combined BWR of 2 and presence of a -gesture were indeed longer,
and more consistently perceived as diphthongs (the triangle-stimuli in Figure 2),
so were some considerably shorter stimuli where only blending weight had been
manipulated (the diamond-stimuli in Figure 2).
Figure 1. Perceptual classication results for Experiment 1, averaged across 12

listeners. Error bars represent one standard deviation from the mean.
(a) Responses to single vowel stimuli and to stimuli with two vowels
asynchronously timed; (b) Responses to stimuli with two vowels coupled
in-phase.
Table 3. Individual diphthong responses (%) for Experiment 1 for the stimuli with
vowels coupled 0-degree in-phase. Diphthong responses at or over 50% are
bold-faced.
Diphthong Response (%)

Stimulus F4 F5 F6 F7 F8 F9 M1 M4 M5 M6 M7 M8
ea 0 0 0 0 0 0 20 0 0 0 0 10
ea30 0 0 0 0 0 0 20 10 0 0 0 20
ea27 0 0 0 0 0 0 20 10 0 10 0 33
ea_W2 50 50 50 10 0 0 50 60 50 0 0 90
ea30_W2 40 40 40 0 0 10 50 90 40 10 10 90
ea27_W2 40 40 40 25 0 0 40 70 40 10 0 80
ea_W2_ 60 50 60 20 20 20 60 50 60 13 10 100
ea30_W2_ 50 50 50 10 10 10 60 30 50 10 30 100
ea27_W2_ 40 40 40 30 10 0 67 30 50 10 0 100
ea_ 0 0 0 10 0 0 20 10 10 0 0 10
ea30_ 0 0 0 0 0 0 40 20 0 0 0 10
ea27_ 0 0 0 0 0 0 10 40 0 0 0 30
3.4. Discussion
The results of the classication showed that stimuli with vowels /e/ and /a/
coupled 180-degree or 360-degree were perceived as a hiatus, while stimuli
with vowels /e/ and /a/ coupled in phase were perceived as either vowel /e/ or
diphthong /ea/, depending on further manipulations. The parameter separating
a diphthong percept from a vowel percept, at least for some of the listeners,
was the blending weight ratio between the two in-phase vowel gestures.
When vowel /a/ received extra blending weight the resulting percept was, for
about half of the listeners, preponderantly a diphthong, while equal blending
weight resulted in a single vowel percept. However, none of the stimuli created
were classied consistently by all listeners as a diphthong, possibly because the
blending weight ratio between /e/ and /a/ was not large enough. We hypothe-
sized that an even larger blending weight ratio would result in a more robust
diphthong percept. We investigated this possibility in Experiment 2.
Table 4. Statistical results (Generalized Linear Mixed Model) for the diphthong
response comparison across stimuli with vowels coupled in-phase (Experiment
1). Positive Z-values indicate that there were more diphthong responses for
the given stimulus than for the base stimulus (stimulus ea).
Stimulus Z p-value
Intercept (ea) 6.119 0.000
ea30 0.733 0.464
ea27 1.369 0.171
ea_W2 5.525 0.000
ea30_W2 5.693 0.000
ea27_W2 5.206 0.000
ea_W2_ 6.383 0.000
ea30_W2_ 5.915 0.000
ea27_W2_ 5.651 0.000
ea_ 0.733 0.464
ea30_ 1.303 0.193
ea27_ 1.548 0.122
4. Experiment 2
4.1. Method
4.1.1. Participants
Sixteen native Romanians, nave to the purpose of the experiment, and with
no reported speech, hearing or language decits participated in this auditory
perception task. Eleven of the listeners (M4M8, F4F9) also participated in
Experiment 1.
4.1.2. Materials and procedure

This experiment manipulated blending weight systematically for synchronously
coordinated vowels /e/ and /a/: while blending weight for /e/ was kept con-
stant, blending weight for /a/ was increased in 10 steps (0.5 increments),
from a base stimulus where both vowels had the same weight (i.e. a BWR of
Figure 2. Relationship between diphthong responses averaged across listeners (%) and
duration of vowel interval of stimuli (ms). Each /ea/ category is represented
by three values, corresponding to the activation interval manipulation.
1), to a stimulus with blending weight for /a/ six times greater than that of /e/
(i.e. a BWR of 6). The other specications of the two vowels were otherwise
kept constant. The vowels of interest were synthesized in the context /b_b/.
The stimuli thus modeled were classied on the basis of auditory percep-
tion by 16 listeners. The same overall procedure as in Experiment 1 was
used. This time the participants had to decide whether the item heard con-
tained a) diphthong /ea/ (BEAB), b) vowel /e/ (BEB), or c) vowel /a/ (BAB).
The hiatus option was excluded as a choice on the basis of the experimenters
auditory evaluation of the stimuli. Eleven of the participants (M4M8, F4F9)
also completed Experiment 1 in the same session. The stimuli were presented
ten times in random order.
4.2. Results
On average, listeners perceived stimuli with (near) equal weight as vowel /e/
over 90% of the time, stimuli with a BWR of 5 to 6 as vowel /a/ over 90% of
the time, and stimuli with a BWR between 3 and 4 as diphthong /ea/ at least
Figure 3. Perceptual classication results for Experiment 2, averaged across 16

listeners. Error bars represent one standard deviation from the mean.
50% of the time (Figure 3). Listeners varied slightly with respect to where in
the continuum their perception switched to diphthong /ea/ (cf. the individual
diphthong responses in Table 5); however 15 of the participants heard the
item with a BWR of 4 as a diphthong at least 80% of the time. One participant
(F3) showed a different pattern: for this listener, a BWR of 2 was enough to
trigger a diphthong percept. The participants in both perceptual experiments
showed a consistent response pattern to the common stimuli (ea and the
stimulus with a BWR of 1, and ea_W2 and the stimulus with a BWR of 2
respectively).
A generalized linear mixed model analysis with the individual classica-
tion responses as the dependent variable (two levels: diphthong vs. any other
response), stimulus as a xed factor, and participant as a random factor statis-
tically corroborated our general ndings. The stimuli with a BWR between 2
and 4.5 were classied signicantly more as a diphthong than the stimulus
with a BWR of 1 (Z > 5.29, p < 0.001, cf. Table 6). Additionally, stimuli
with a BWR between 3 and 4 were more often classied as a diphthong,
compared to the stimulus with a BWR of 2 (Z > 3.72, p < 0.001, Table 6).
Finally, there were more diphthong responses to the stimulus with a BWR of
4 than to the stimulus with a BWR of 3 (Z = 6.03, p < 0.001), 3.5 (Z = 5.02,
p < 0.001) or 4.5 (Z = 6.85, p < 0.001). These tests, which factored in the
listener-specic differences, showed that the diphthong responses signicantly
increased starting from the stimulus with a BWR of 2, reached a maximum at
Table 5. Individual diphthong responses (%) for Experiment 2. Diphthong responses at

or over 80% are bold-faced.
Diphthong Response (%)

BWR
F1 F2 F3 F4 F5 F6 F7 F8 F9 M2 M3 M4 M5 M6 M7 M8
[a]/[e]
10 0 20 0 30 0 0 0 10 0 0 0 0 0 0 20
1.5 0 0 50 0 0 0 0 0 0 0 0 0 0 0 0 10
2 60 20 90 40 30 70 0 0 0 0 60 50 20 0 10 50
2.5 50 60 70 50 50 30 11 0 40 10 50 20 70 0 30 80
3 80 80 60 60 40 30 80 10 70 30 80 20 50 20 40 80
3.5 80 80 40 70 30 70 70 0 70 70 80 20 90 40 40 80
4 90 100 40 80 80 80 80 90 80 90 80 90 100 90 80 100
4.5 50 90 10 30 50 20 33 100 20 30 30 50 40 40 80 50
5 20 50 0 0 0 0 0 50 0 50 0 20 0 0 20 0
5.5 20 10 10 0 0 0 0 10 0 20 0 0 0 0 0 0
6 0 40 0 0 0 0 0 0 0 10 0 0 0 0 0 0
the stimulus with a BWR of 4, and then decreased again.2 A one-sample t-test
carried out on the percentage responses of each participant showed that the
diphthong responses to the stimulus with a BWR of 4 were signicantly
higher than 50% (t(15) = 9.73, p < 0.001), indicating that for this stimulus
the diphthong response consistently outnumbered any of the other two possible
responses (vowel /e/ or vowel /a/).
4.3. Discussion
Experiment 2 showed that manipulating relative blending weight of two
synchronously timed vowels triggered a perceptual switch from a monophthong
to a diphthong. Equal blending weight for vowels /e/ and /a/ coupled in-phase
resulted in an /e/ percept, while a blending weight ratio greater than 5 resulted
in the percept of vowel /a/; nally, a blending weight ratio around 4 resulted
2. Given the robust statistical results ( p-values for most comparisons either under
0.001 or over 0.1), the alpha levels were not corrected for multiple testing. How-
ever, even by using the conservative Bonferroni correction, 50 tests would have to
be carried out before an observed p-value of 0.001 would result in a familywise
error rate of 0.05. Our main patterns of (non-)signicance would remain the same
even after using such a correction.
Table 6. Statistical results (Generalized Linear Mixed Model) for the diphthong
response comparison across the stimuli tested in Experiment 2. Positive
Z-values indicate that there were more diphthong responses for the given
stimulus than for the base stimulus (stimuli with BWR of 1 and 2
respectively).
Stimulus Z p-value Stimulus Z p-value

Intercept (BWR1) 7.937 0.000 Intercept (BWR2) 3.727 0.000
BWR1.5 1.228 0.219 BWR1 5.292 0.000
BWR2 5.292 0.000 BWR1.5 5.617 0.000
BWR2.5 6.172 0.000 BWR2.5 1.245 0.213
BWR3 7.843 0.000 BWR3 3.723 0.000
BWR3.5 8.567 0.000 BWR3.5 4.821 0.000
BWR4 11.474 0.000 BWR4 9.018 0.000
BWR4.5 7.198 0.000 BWR4.5 2.755 0.006
BWR5 2.153 0.031 BWR5 3.723 0.000
BWR5.5 0.462 0.644 BWR5.5 5.479 0.000
BWR6 1.501 0.133 BWR6 5.595 0.000
in the percept of diphthong /ea/. The stress-conditioned alternation between

diphthong /ea/ (stressed) and vowel /e/ (unstressed) in Romanian was achieved
by manipulating relative blending weight of the two vowels. Greater blending
weight of one of the vowels resulted in a blended target closer to that of the
stressed vowel and further from the unstressed one. For the specic case
examined here (two synchronous vowels), this is equivalent to the effect of a
spatial modulation (-) gesture affecting one of the gestures.
Because the modeled alternation is conditioned by the presence or absence
of stress, its successful modeling may implicitly provide a way of modeling
stress in gestural terms. The results reported here suggest the possibility that
stress could be modeled by means of a -gesture causing the spatial strengthen-
ing of the affected gesture. While gesture lengthening (i.e. the effect of a
-gesture, modeled in Experiment 1) did not seem to play a major role in model-
ing the specic stress effect examined here (i.e. the diphthong/monophthong
alternation), it remains to be established by further explicit investigations of
stress whether and to what extent spatial and temporal modulation gestures
(cf. Saltzman et al. 2008) may play a role in modeling word stress production.
5. Experiment 3
Experiment 2 showed that modeling stress as a variation of blending weight

(as an implementation for a -gesture) resulted in the expected perceptual
pattern. If the model used in the perceptual experiment correctly captures key
aspects of the production of diphthongs (and monophthongs), then the acous-
tics of natural and modeled stimuli should match well, and the model of
diphthongs as in-phase coupled vowel gestures would gain further plausibility.
Experiment 3 tests this hypothesis.
5.1. Method
For the comparison of the acoustic characteristics of natural and modeled
stimuli, we used the word series in (4). The natural data were produced by 12
native speakers of Romanian (ve male), who read the stimuli, embedded in a
constant carrier phrase, ten times in random order, and at a self-selected casual
speaking rate. The target words were separated by unrelated ller words,
embedded in the same carrier phrase. All the recordings were sampled at
22.05 kHz. The same stimuli were modeled using TADA: Both diphthong
and alternating /e/ words were modeled with the gestures for vowels [] and
[a] coupled in-phase, either with equal blending weight for alternating /e/
(henceforth blended /e/) (similar to the stimulus with a BWR of 1 in Experi-
ment 2), or with a BWR of 4 for the diphthong (similar to the stimulus with a
BWR of 4 in Experiment 2). Non-alternating /e/ was modeled with a single
gesture for vowel [] (similar to stimulus e in Experiment 1). Acoustic outputs
were generated on the basis of these articulatory congurations.
(4) Diphthong: ['sea.ra] the evening
Alternating /e/: [se.'ra.ta] the evening party
Non-alternating /e/: ['se.ra] the greenhouse
The acoustic outputs of both natural productions and modeled stimuli were
analyzed using Praat speech analysis software (Boersma and Weenink 2009).
The vocalic interval was manually labeled from the onset to the offset of the
vowel-specic formant contours, and formant frequencies for ve formants
were automatically calculated using Praats short-term spectral analysis func-
tion. The frequency values, in Hertz, for the rst two formants at the onset of
the measured interval, at its offset, and every 10% into the interval, totaling
eleven measuring points, were used in the analysis. Onset and offset points
were manually determined, while the other points were determined automati-
cally on the basis of onset and offset landmarks. Following the methodology
described in Harrington et al. (2008), we reduced these time-varying formant

trajectories to the rst three coefcients of a discrete cosine transformation,
shown to be proportional to a trajectorys mean, slope and curvature. We thus
compressed the formant information of each vowel/diphthong to a single point
in a six-dimensional space (three coefcients for each vowel formant).
To assess the degree of acoustic similarity between natural tokens and
modeled stimuli, the Euclidean distance (E) from each natural token to every
modeled stimulus was calculated on the basis of the six coefcients. Thus,
three Euclidean distances one to modeled ['sea.ra] (E ['sea.ra]), one to modeled
[se.'ra.ta] (E [se.'ra.ta]), and one to modeled ['se.ra] (E ['se.ra]), were calculated for
every natural token. The prediction was that if natural and modeled tokens
were acoustically similar, then naturally produced diphthongs should be signi-
cantly closer acoustically (i.e. they should have smaller Euclidean distances)
to the modeled diphthong than to either blended or mono-gestural /e/, that
natural alternating /e/ should be closest to modeled blended /e/, and that
natural non-alternating /e/ should be closest to modeled non-alternating mono-
gestural /e/. If however, natural and modeled stimuli were not acoustically
similar, the proximity patterns were expected to be random, with no matching
between natural and modeled categories. For the analysis, Euclidean distance
values were averaged across repetitions, so that each speaker contributed one
value per word.
5.2. Results
A comparison of the model stimulus formant trajectories with the trajectories
averaged across male and female speakers productions showed comparable
acoustic patterns for naturally produced tokens and model stimuli (Figure 4).
While precise values for F1 and F2 differed to some extent between produc-
tions by male speakers, by female speakers and by the model, the general
patterns for stimuli types were similar in that both F1 and F2 trajectories for
alternating /e/ were (slightly) more extremely front (higher F2, lower F1) than
those for diphthong /ea/, and less extreme than those for non-alternating /e/.
The Euclidean distance analysis conrmed the acoustic similarity between
natural and modeled stimuli. Naturally produced diphthong words were closest
acoustically to the modeled diphthong: for the word ['sea.ra], the distance
E['sea.ra] had smaller values (Median = 270) than either E['se.ra] (Median = 379)
or E[se.'ra.ta] (Median = 294). Likewise, naturally produced alternating /e/ stimuli
were closer to modeled blended /e/ (Median = 241) than to either the diphthong
(Median = 277) or mono-gestural /e/ (Median = 338), and natural non-alternat-
ing /e/ stimuli were closer to modeled mono-gestural /e/ (Median = 246) than
Figure 4. Vowel F1 and F2 trajectories of the words ['sea.ra], [se.'ra.ta], and ['se.ra],
plotted on the basis of values measured at onset (0%), offset (100%), and
every 10% into the vowel interval, as produced by the model, by male
speakers and by female speakers.
Table 7. Statistical results (Wilcoxon Signed Ranks test) for the comparisons between
Euclidean distances. Two-tailed signicance is reported. Effect size (r) was
calculated on the basis of Z-scores.
Word Comparison Rank N Test statistics

(+)
['sea.ra] E['sea.ra] E[se.'ra.ta] 111 Z = 2.981, p = 0.001, r = 0.610
()
(+)
['se.ra] E['se.ra] E[se.'ra.ta] 210 Z = 2.824, p = 0.002, r = 0.578
()
(+)
[se.'ra.ta] E[se.'ra.ta] E['sea.ra] 111 Z = 2.824, p = 0.002, r = 0.578
()
to either the diphthong (Median = 378) or blended /e/ (Median = 325). Paired-
samples Wilcoxon Signed Ranks tests, summarized in Table 7, conrmed that
for each word the smallest Euclidean distance namely the one matching in
category was signicantly smaller than the distance next up in value, validat-
ing the consistency of the pattern across speakers.
5.3. Discussion
The observed acoustic similarity between model stimuli and natural tokens
could be taken as an indication of a comparable similarity at the production
level, and thus the gestural conguration probably employed in natural pro-
duction could be inferred from the known gestural conguration employed in
the model. It is then plausible that natural tokens were produced similarly to
the modeled ones, with the gestures for vowels /e/ and /a/ coupled in-phase
both for alternating /e/ and diphthong /ea/, but with equal or different blending
weights, as a function of absence or presence of stress. Additionally, the fact
that natural alternating /e/ was acoustically more similar to the bi-gestural /e/
in modeled [se.'ra.ta], than to the mono-gestural /e/ in ['se.ra], suggests that
indeed production of alternating /e/ may involve two gestures, rather than just
one.
Alternatively, the difference between [se.'ra.ta] and ['se.ra] could have been
modeled as a difference in target specications (specically, the target for
[se.'ra.ta] could be set to the post blending targets of the BW1 model stimulus),
rather than as a difference in gestural composition. The present model, while
tting the natural data reasonably well, has nevertheless the additional advan-
tage of capturing the lexical relationship between [se.'ra.ta] and ['sea.ra] as a
possible source for the difference between [se.'ra.ta] and ['se.ra].
As to the diphthong production model, ours is the rst to attempt to relate

the diphthongs gestural, acoustic, and perceptual properties in a principled
way to their structural properties. Similar nuclear diphthongs have been re-
ported for Dutch (Collier et al. 1982), English (Davis and Hammond 1995),
French (Kaye and Lowenstamm 1994), Italian (Marotta 1988), however, the
analyses proposed are almost exclusively phonological/structural, with no
principled connection to their articulation or acoustics. One exception is
Collier et al.s (1982) acoustic and physiological study on Dutch diphthongs,
where they describe nuclear diphthongs as being produced with synchronous
activations of the muscles involved, with onsets and offsets differing acousti-
cally from corresponding single vowels, and with minimal second formant fre-
quency change. On the basis of this evidence, the authors suggest that nuclear
diphthongs in Dutch are mono-gestural, and as such different compositionally
from non-nuclear diphthongs (and by extension from hiatus sequences). No
formal model is however proposed as to how they should be distinguished in
production from other mono-gestural structures (vowels). Prima facie, the
empirical evidence in this study is not incompatible with the model we pro-
posed for nuclear diphthongs, specically a bi-gestural synchronous structure
with unequal blending rates. Whether and how exactly our model could be
extended to the Dutch diphthongs, and potentially to nuclear diphthongs in
other languages, is a question for future research.
Finally, speakers may produce the three categories guided by stored exem-
plars of the respective categories (cf. Exemplar Theory, Pierrehumbert 2002).
If so, our proposal provides a formal model of the specic ingredients that
may be involved in (re)producing such exemplars.
6. Conclusion
We argued that the temporal organization of hetero- and tauto-syllabic vowel

clusters could be modeled in terms of particular coupling relations. The results
in Experiment 1 showed that indeed non-synchronous coordination (180 or
360 degrees) consistently resulted in the percept of hetero-syllabic clusters
(hiatus /e.a/), while in-phase coordination resulted in the percept of tauto-
syllabic vowel clusters (diphthong /ea/ or blended vowel /e/, depending on
further manipulations). Our proposed model can be understood within the
general tenets of Articulatory Phonology (Browman and Goldstein 1990,
1992; Goldstein et al. 2006), with the advantage that the hypothesized coupling
relations between gestures would serve not only as units of action (encapsulat-
ing constriction production) but also as units of information (encapsulating
linguistic contrast). Nevertheless, the model argued for here represents a

principled way of accounting for the timing dimension in the production of
hetero- and tauto-syllabic vowel clusters independent of assumptions on lin-
guistic representation.
While Experiment 1 addressed the more general issue of the distinction
between hetero- and tauto-syllabic vowel clusters, Experiments 2 and 3 specif-
ically modeled the stress-conditioned alternation in Romanian between vowel
/e/ and diphthong /ea/. This alternation has been modeled by articulatorily
strengthening (spatially modulating) the vowel associated with stress such
that it was realized closer to its specied target, to the detriment of the other
vowel constriction which as a result was articulatorily undershot.
The successful modeling of this particular stress-conditioned alternation
implicitly suggests that this phenomenon, and possibly word stress in general
may be modeled in terms of a spatial modulation gesture (-gesture) being
applied to the vowel affected by stress. Thus, the results presented here sug-
gest a possible analysis of word stress in gestural terms. This parameter may
also explain (some of ) the duration difference between stressed and unstressed
vowels, since magnitude of spatial excursion (i.e. extent of target overshoot/
undershoot) may affect at least in part the temporal domain (cf. Beckman and
Cohen 2000 for a review). While spatial strengthening (achieved via differential
blending weight) turned out to be the most important parameter in modeling
the stress-conditioned vowel-diphthong alternation in Romanian, it remains to
be determined by future explicit investigations of word stress whether and
to what extent this parameter can model word stress and stress effects cross-
linguistically, or whether a combination with additional parameters (such as
temporal warping) may provide better models for other stress-related phenomena.
Acknowledgements
Work supported by the Deutsche Forschungsgemeinschaft (PO 1269/1-1) and

by fellowships from Yale University to the rst author.
References
Beckman, Mary E. and K. Bretonnel Cohen

2000 Modeling the articulatory dynamics of two levels of stress contrasts.
In: Gsta Bruce and Merle Horne (eds.), Prosody: Theory and
Experiment Studies Presented to Gsta Bruce, 169200. Dordrecht:
Kluwer Academic Publishers.
Beckman, Mary E. and Jan Edwards

1994 Articulatory evidence for differentiating stress categories. In Patricia
A. Keating (ed.), Phonological Structure and phonetic Form: Papers
in Laboratory phonology III, 733. Cambridge, UK: Cambridge Uni-
versity Press.
Boersma, Paul and David Weenink
2009 Praat: doing phonetics by computer [Computer program]. Version
5.1.05. Retrieved June 1, 2009 from http://www.praat.org/.
Browman, Catherine P. and Louis Goldstein
zation of phonological structures. Bulletin de la Communication
Parle 5: 2534.
1992 Articulatory phonology: an overview. Phonetica 49: 155180.
1990 Gestural specication using dynamically-dened articulatory struc-
tures. Journal of Phonetics 18: 299320.
Browman, Catherine P., Louis Goldstein, J. A. Scott Kelso, Philip Rubin and Elliot
Saltzman
1984 Articulatory synthesis from underlying dynamics. Journal of the
Acoustical Society of America 75: S22S23.
Burileanu, Dragos
2002 Basic research and implementation decisions for a text-to-speech
synthesis system in Romanian. International Journal of Speech
Technology 5: 211225.
Byrd, Dani
2000 Articulatory vowel lengthening and coordination at phrasal junctures.
Phonetica 57: 316.
Byrd, Dani and Elliot Saltzman
2003 The elastic phrase: modeling the dynamics of boundary adjacent
lengthening. Journal of Phonetics 31: 149180.
Byrd, Dani, Stephen Tobin, Erik Bresch and Shrikanth Narayanan
2009 Timing effects of syllable structure and stress on nasals: a real-time
MRI examination. Journal of Phonetics 37: 97110.
Chitoran, Ioana
2001 The Phonology of Romanian: A Constraint-Based Approach. Berlin,
Chitoran, Ioana
2002 A perception-production study of Romanian diphthongs and glide-
vowel sequences. Journal of the International Phonetic Association
32: 203222.
Chitoran, Ioana and Jos Ignacio Hualde
2007 From hiatus to diphthong: the evolution of vowel sequences in
Romance. Phonology 24: 3775.
Cho, Taehong
2004 Prosodically conditioned strengthening and vowel-to-vowel coarti-
culation in English. Journal of Phonetics 32: 141176.
Collier, Ren, Fredericka Bell-Berti and Lawrence J. Raphael
1982 Some acoustic and physiological observations on diphthongs. Lan-
guage and Speech 25: 305323.
Davis, Stuart and Michael Hammond
1995 On the status of on-glides in American English. Phonology 12: 159
182.
Forster, K.L. and J.C. Forster
2003 A Windows display program with millisecond accuracy. Behavior
Research Methods, Instruments, & Computers 35: 116124.
Fowler, Carol A.
1981 Production and perception of coarticulation among stressed and
unstressed vowels. Journal of Speech and Hearing Research 46:
127139.
Goldstein, Louis, Dani Byrd and Elliot Saltzman
2006 The role of vocal tract gestural action units in understanding the
evolution of phonology. In Michael A. Arbib (ed.), Action to
Language via the Mirror Neuron System, 215249. Cambridge:
Haken, H., J.A.S. Kelso and H. Bunz
1985 A theoretical model of phase transitions in human hand movements.
Biological Cybernetics 51: 347356.
Hanson, Helen M. and Kenneth N. Stevens, K. N.
2002 A quasi-articulatory approach to controlling acoustic source parame-
ters in a Klatt-type formant synthesizer using HLSyn. Journal of the
Acoustical Society of America 112: 11581182.
Harrington, Jonathan, Felicitas Kleber and Ulrich Reubold
2008 Compensation for coarticulation, /u/-fronting, and sound change in
standard southern British: An acoustic and perceptual study. Journal
of the Acoustical Society of America 123: 28252835.
Harrington, Jonathan, Janet Fletcher and Corinne Roberts
1995 An analysis of truncation and linear rescaling in the production of
accented and unaccented vowels. Journal of Phonetics 23: 305322.
de Jong, Kenneth J.
1995 The supraglottal articulation of prominence in English: Linguistic
stress as localized hyperarticulation. Journal of the Acoustical Society
of America 97: 491504.
Kaye, Jonathan D. and Jean Lowenstamm
1984 De la syllabicit. In Franois Dell, Daniel Hirst, Jean-Roger Vergnaud
(eds.), La forme sonore du langage, 123159. Paris: Hermann.
Krakow, Rena
1999 Physiological organization of syllables: a review. Journal of Phonetics
27: 2354.
Lindblom, Bjrn
1963 On vowel reduction (Report No. 29). Stockholm, Sweden: The
Royal Institute of Technology, Speech Transmission Laboratory.
Marin, Stefania
accepted Romanian blended vowels: A production model of incomplete
neutralization. In Selected papers of the PaPI 2009. Mouton de
Gruyter.
Marin, Stefania
2007 An articulatory modeling of Romanian diphthong alternations. In
Jrgen Trouvain and William J. Barry (eds.), Proceedings of the
XVIth International Congress of Phonetic Sciences, 453456.
Saarbrcken, Germany.
Marin, Stefania
2005 Complex Nuclei in Articulatory Phonology: The Case of Romanian
Diphthongs. In Randall Gess and Edward J. Rubin (eds.) Selected
papers of the Linguistic Symposium in Romance Languages 34th,
161177. Amsterdam, Philadelphia: John Benjamins.
Marin, Stefania and Marianne Pouplier
English: Testing the predictions of a gestural coupling model.
Journal of Motor Control 14: 380407.
Marotta, Giovanna
1988 The Italian diphthongs and the autosegmental framework. In Pier
Marco Bertinetto and Michele Loporcaro (eds.) Certamen Phonolo-
gicum, 389420. Torino: Rosenberg & Sellier.
Mooshammer, Christine and Susanne Fuchs
2002 Stress distinction in German: Simulating kinematic parameters of
tongue tip gestures. Journal of Phonetics 30: 337355.
Nam, Hosung
2007 A Gestural Coupling Model of Syllable Structure. PhD Dissertation,
Department of Linguistics, Yale University.
Nam, Hosung, Louis Goldstein and Michael Proctor
n.d. TADA (TAsk Dynamics Application). Retrieved from http://www.
haskins.yale.edu/tada_download/
Nam, Hosung, Louis Goldstein and Elliot Saltzman
2009 Self-Organization of syllable structure: A coupled oscillator model.
In Franois Pellegrino, Egidio Marsico, Ioana Chitoran and Cristophe
Coup (eds.), Approaches to phonological complexity, 299328.
Berlin/New York: Mouton de Gruyter.
Pierrehumbert, Janet B.
2002 Word-specic phonetics. In Carlos Gussenhoven and Natasha Warner
(eds.), Papers in Laboratory Phonology VII, 101139. Berlin:
Mouton De Gruyter.
Saltzman, Elliot and Kevin G. Munhall
1989 A dynamical approach to gestural patterning in speech production.
Ecological Psychology 1: 333382.
Saltzman, Elliot, Hosung Nam, Jelena Krivokapi and Louis Goldstein

2008 A task-dynamic toolkit for modeling the effects of prosodic structure
on articulation. In Proceedings of the 4th Conference on Speech
Prosody, 175184. Campinas, Brazil.
Selkirk, Elisabeth O.
1984 On the major class features and syllable theory. In Mark Aronoff
and Richard T. Oehrle (eds.), Language Sound Structures, 107
136. Cambridge, Mass.: MIT Press.
Sluijter, Agaath M.C. and Vincent J. van Heuven
1996 Spectral balance as an acoustic correlate of linguistic stress. Journal
of the Acoustical Society of America 100: 24712485.
Turvey, Michael
1990 Coordination. American Psychologist 45: 938953.
Coupling of tone and constriction gestures in
pitch accents
Doris Mucke, Hosung Nam, Anne Hermes and

Louis Goldstein1
Abstract
This study investigates the temporal coordination of tones and constriction gestures in
Catalan and Viennese German using electromagnetic articulography. It is observed that
nuclear rises are later in German than in Catalan. We model the difference in tonal
alignment patterns using a coupled oscillator model, proposing that it can emerge
from differences in the coupling relations between tones and oral constriction gestures.
In Catalan, the high tone gesture is coupled in-phase with the accented vowel. In
German, a low tone and a high tone gesture compete with each other to be in-phase
with the vowel resulting in a rightward shift of the high tone gesture and therefore to
a delayed rise on the acoustic surface. We conclude with a comparison of lexical and
prosodic pitch accent tones and their interaction with the syllable-level coupling graph.
In contrast to lexical tones, prosodic tones do not perturb the within-syllable relations
of consonant and vowel timing.
1. Introduction
This study describes the temporal coordination pattern between tones and oral
constriction gestures in Catalan and German and attempts to analyze the
temporal pattern using a planning model of intergestural timing grounded
on Articulatory Phonology. We will show that this coordination follows the
basic principles applied to consonant clusters, which have been reported in
numerous studies (Browman and Goldstein 1988, Honorof and Browman
1995, Byrd 1995, Bombien et al. 2010, Goldstein, Chitoran, and Selkirk 2007,
Goldstein et al. 2009, Hermes et al. 2008, Marin and Pouplier, 2010, Nam 2007,
Nam, Goldstein, and Saltzman 2009, Shaw et al. 2009).
Within the framework of Articulatory Phonology, speech can be decomposed
into invariant phonological units, articulatory gestures that are temporally
coordinated with one another (Browman and Goldstein 1989). A coupled
oscillator planning model of speech timing has been developed that provides
a possible way of modelling the coordination of gestures in time (Browman
and Goldstein 2000, Goldstein et al. 2009, Nam and Saltzman 2003, Nam,
1. The rst two authors equally contributed to this study.

206 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
Goldstein, and Saltzman 2009). In the model, gestures are associated with
nonlinear planning oscillators (or clocks) that are coupled with each other in
a pattern specied by a coupling graph, assumed to be part of an utterances
phonological representation. In the present study, we model the control of
pitch to achieve a target in F0 as a tonal gesture and investigate the temporal
coordination of tonal gestures with oral constriction gestures in Catalan and
Viennese German (also referred to as Standard Viennese Austrian) bitonal LH
pitch accents.
It has been reported elsewhere that Catalan and German are expected to
show different alignment patterns for nuclear rises (Prieto et al. 2007b for
Catalan, Mcke et al. 2009 for Viennese German). We aim to test whether
those alignment differences can be seen as phonological in nature in the sense
that they emerge from topological differences in phonological coupling
graphs. Our results show that in the acoustic analysis, the accentual rise starts
later with respect to segmental landmarks in Viennese German compared to
Catalan. In the articulatory analysis, we focus on the start of the F0 rise move-
ment (the L valley) as the start of the H tone gesture. In Catalan, the start of
the H tone gesture is tightly synchronised with the start of the vowel gesture,
while in the German variety the H tone gesture starts considerably later. We
hypothesize that the difference lies in the coupling relations between tones
and vowel gestures. Therefore, we propose a non-competitive coupling struc-
ture type for Catalan, and a competitive structure (usually known from those
in consonant clusters) for Viennese German. The competitive coupling struc-
ture leads to a rightward shift of the H tone gesture (and therefore to later F0
rises on the acoustic surface).
We conclude with a discussion on the difference between prosodic (pitch
accent tones) and lexical tones and how they are supposed to interact with
the syllable-level coupling graphs for consonant and vowel coordination.
1.1. What is a tone gesture?

In the autosegmental-metrical approach to intonation, tones are treated as tonal
point-events: target values of pitch (H or L) that occur at some instant in time.
An LH rising pitch accent is composed of two tonal targets or two tonal
events: a low valley (L), where the rise starts, and a high peak (H), where
the rise ends. According to the segmental anchor hypothesis (within the auto-
segmental-metrical phonology framework), those tonal events are anchored to
acoustically-dened events associated with consonants and vowels (Arvaniti,
Ladd, and Mennen 1998 for Greek, Ladd et al. 1999 for English, Ladd,
Mennen, and Schepman 2000 for Dutch, Prieto and Torreira 2007a for
Coupling of tone and constriction gestures in pitch accents 207
Figure 1. Summary of alignment properties of prenuclear LH accents for Greek,

English and German varieties adopted from Atterer and Ladd (2004);
Dsseldorf (Standard Northern German) and Vienna (Southern German)
added by Mcke et al. (2008b). C and V are stylized segments in the
acoustic domain.
Spanish, DImperio, Petrone, and Nguyen 2007 for Italian, Atterer and Ladd
2004 and Mcke et al. 2008b for different German varieties, Ladd 2008 for a
general overview). Usually, tones occur in the vicinity of the lexically stressed
syllable carrying the tone. Therefore, tones are hypothesized to be aligned
with segments corresponding to the lexically stressed syllable. Figure 1 shows
the alignment properties of prenuclear rising pitch accents in different lan-
guages. The start of the rise (the L event) in English and Greek is constantly
aligned with the left periphery of the accented syllable, at the beginning of the
acoustic segment associated with the syllable-onset consonant. In fact, these
are not the only two languages reported in the literature with this pattern for
L (Ladd, Mennen, and Schepman 2000 for Dutch, DImperio 2002 for Italian,
Prieto and Torreira 2007a for Spanish, Prieto et al. 2007b for Catalan). How-
ever, German has been shown to have later rises in prenuclear accents. In
Standard Northern German (low Franconian speech area near Dsseldorf ),
the L occurs around the middle of the C1 segment, while in Southern German
(Viennese German) L occurs even later, during V1.
In Articulatory Phonology, speech gestures are modelled as invariant func-
tional units of vocal tract constricting action and speech can be decomposed
into a constellation of gestures: articulatory events with extent in time that
can temporally overlap with one another. The regularity and variability in
intergestural timing have been described by many studies (Byrd 1994, 1996a,b;
Cho 2001, Bombien et al. 2010). Such temporal patterns have been modelled
using an intergestural timing model, where the intergestural temporal relation-
ship (e.g. timing and connectivity) is specied in an inter-oscillator coupling
Figure 2. Coupling graphs for (2a) pa (simple syllable onset, CV) and (2b) up
(simple syllable coda, VC) with in-phase (solid lines) and anti-phase (dotted
lines) target specications.
network, or coupling graph (Browman and Goldstein 2000, Saltzman and

Byrd 2000, Nam and Saltzman 2003, Nam 2007, Goldstein, Chitoran, and
Selkirk 2007, Goldstein et al. 2009). In the model each gesture is associated
with an oscillator (or clock) and the oscillators are coupled to one another in
a pairwise, potentially competing fashion, rather than being serially arranged.
The model incorporates the two stable modes of coupling that have been
shown to be spontaneously accessible (without learning) when performers are
asked to oscillate multiple limbs: synchronous or in-phase and sequential or
anti-phase modes (Turvey 1990).
Within the coupling hypothesis of syllable structure proposed by Goldstein
et al. (2000) and Nam, Goldstein, and Saltzman (2009), the two intrinsic
modes are used to model the temporal relations within a syllable. The in-phase
mode, which is stronger and more stable, is used to model the onset-nucleus
relation. Figure 2a provides a coupling graph for pa, where the onset con-
sonant is coupled in-phase to the vowel. Since gestures are triggered at phase
0 of their associated clocks, and since C and V clocks are in-phase, the con-
sonant and vowel gestures will be initiated synchronously. In contrast, the
nucleus-coda relation is dened by the anti-phase mode (sequential coupling).
Figure 2b displays the coupling graph for up. The vowel and the consonant
gestures are initiated sequentially because each is triggered at phase of 0 of
their associated clocks, but those clocks are 180 out of phase in this case.
Because of the intrinsic strength of in-phase coupling (Nam et al. 2009),
the intrinsically sequential consonants of a complex onset are all pulled into an
in-phase coupling with respect to the vowel: the onset gestures are all coupled to
the vowel gesture in a synchronous mode (in-phase) but this coupling competes
with the sequential coupling between the two consonants (anti-phase). Figure 3a
shows a coupling graph for spa, where the /s/ and /p/ are both in-phase
coupled to the vowel and are anti-phase coupled to one another. Thus, the C-
V coupling competes with C-C coupling. As a result, compared to the single C
case, C1 is shifted leftwards with respect to the V target, and C2 is shifted
rightwards to overlap the vowel more (the c-center effect, Browman and
Goldstein 1988; 2000, Bombien et al. 2010, Gao 2009, Goldstein, Chitoran,
Figure 3. Coupling graphs for (3a) spa (complex syllable onset, CCV) and (3b) ask
(complex syllable coda, VCC) with in-phase (solid lines) and anti-phase
(dotted lines) target specications.
and Selkirk 2007, Goldstein et al. 2009, Hermes et al. 2008, Marin and Pouplier
2010, Nam 2007, Nam, Goldstein, and Saltzman 2009, Shaw et al. 2009).
In many languages, complex codas are dened by a non-competitive
coupling structure, because of the weaker strength of anti-phase coupling. A
coupling graph for a VCC coordination in English (e.g. ask) is provided in
Figure 3b. Only C1 is linked directly to the V gesture; the coupling is in an
anti-phase relation. The following Cs are coordinated only with respect to
each other (anti-phase), but not directly to the V gesture. In what follows we
will apply the basic coupling modes on the coordination of Tone gestures with
oral constriction gestures.
A tone can also be understood as a coordinated articulatory action to
achieve a tonal task goal and thus dened as a dynamical system in F0 space,
a tone gesture (Gao, 2009). Considering a tone as a gesture enables one
to model the tone-to-gesture timing within the intergestural timing model. A
rising pitch accent, e.g. the H tone gesture (or H gesture) involves a tonal
movement to an H target in F0 (schematised in Figure 4). The onset of a
Figure 4. Analysis of a rising LH pitch accent contour: Tones as gestural action units
(above), and tones as events (below).
Tone gesture is taken to be the point in time at which F0 begins to move in the
direction of that gesture's target. In an LH rise, the onset of the H tone gesture
coincides with the offset of the preceding L tone gesture. In this example of
pitch accents, the beginning of the L gesture is unclear. Note that Tone ges-
tures (L and H gestures in Figure 4) are dynamical systems of control that
have extent in time (their activation intervals), while in the autosegmental
view, tones are events that occur at instants in time (H and L in Figure 4).
Gao (2009) extended the coupled oscillator model for intergestural timing
to the analysis of temporal pattern of lexical tones in Mandarin Chinese. She
investigated syllables with single onsets (CV and CVC) such as [ma] or [man].
For syllables with only one tone (Tone 1H, Tone 3L), she showed that the
oral constriction gestures (C, V) and the Tone gestures (T) are activated in the
temporal order of C-V-T. The onset of the consonant gesture occurred con-
siderably (~50 ms) before the onset of the vowel gesture, while the onset of
the Tone gesture occurred after the vowel gesture, with about the same lag.
She demonstrated that this timing pattern of tones and constriction gestures
(C and V) can be predicted by hypothesizing that Tone gestures function
like C gestures in the competitive coupling topology in Figure 3a: the C and
T gesture are both coupled in-phase to the vowel and C and T are coupled in
anti-phase to one another. As a result, the C gesture is shifted leftwards with
respect to V, while the Tone gesture is shifted rightwards (c-center like co-
ordination of C, V and T). The coupling graph for tones and oral constriction
gestures in Mandarin Chinese proposed by Gao (2009) is provided in Figure 5.
This hypothesis was further supported by the results of Tone 4 (HL). Here,
the H tone was synchronized with the V, while C preceded and L followed by
substantial lags. This pattern provided evidence that C-H-L are all coupled
anti-phase to one another and in-phase to the vowel.
In the present study, we extend the work on Tone gestures to pitch accent
tones, and we examine how these Tone gestures are temporally coordinated
with oral constriction gestures and with other Tone gestures. One hypothesis
Figure 5. Coupling graph for Tone 1H, Tone 3L in Mandarin Chinese, syllable
[ma], adapted from Gao 2009. The Tone gesture (T) behaves like an
additional consonant (C).
is that they will be coordinated to C and V gestures in a manner similar to that

observed in Chinese, namely that they function as C gestures, triggering the
c-center effect (causing the C to lead the V), and providing evidence for the
competitive loop structure in Figure 3a. However, there are also good reasons
for thinking that the nature of the coordination will be different. The tone ges-
tures in Chinese are part of the lexical representation of words/syllables, and
for this reason they should be integrated into the network of coupling relations
that dene those syllables. Since the coupling of a pitch accent to syllables is
post-lexical, the coordination of a Tone gesture with a particular syllable in the
utterance might not modify the intrasyllabic coupling relations that dene that
syllable. These possibilities will be evaluated in the present study taking the
following factors into account:
In both languages, Catalan and German, we test the effect of syllable struc-
ture (open and closed syllables) on the alignment patterns. It is reported else-
where that peaks (the end of the rise) are systematically placed later in closed
than in open syllables (Prieto & Torreira 2007a, Mcke et al. 2009). However,
the effect of syllable structure on the start of the rise is unclear. In addition, we
test for the effect of place of articulation on the alignment pattern to test for
intrinsic variation (Lfqvist & Gracco 1999). The variation is expected to be
gradient and to be the result of articulator-level interactions among the particular
gestures that the coupling graph regulates. In Catalan, we also test for effects of
focus structure on the alignment pattern. Contrary to German varieties, where
focus structure affects the alignment of nuclear rises (later peaks in contrastive
than in broad focus, Braun 2007; Braun & Ladd 2003), Catalan is expected to
mark focus structure by peak height and not by peak alignment. Therefore, we
expect a similar alignment pattern for broad and narrow focus in Catalan reect-
ing the same type of coupling graph. Foot size is taken into account to test for
effects of polysyllabic shortening, which is known to affect the location of the
peak but not necessarily the start of the rise up to the peak.
2. Method
2.1. Speech materials

For all speech materials, pairs of meaningful sentences were constructed so as
to place the pitch accents in nuclear position.
2.1.1. Catalan speech materials

For Catalan, two rising pitch accent types were investigated: LH rises in broad
focus and LH rises in contrastive focus (the latter one with a higher peak posi-
tion, Prieto et al. 2007b). Therefore, mini-dialogues were designed in which

target words (such as the ctitious name Mimami) would carry either broad
or contrastive focus, as in answer (1) in broad focus or (2) in contrastive focus.
There was always a distance of at least one syllable to the end of the utterance.
The nuclear rise was followed by a low boundary tone.
(1) Q.: Qui va venir? (lit.: Who came?)
A.: La Mimami (lit.: The Mimami)
(2) Q.: Va venir la Mimamila? (lit.: Came the Mimamila?)
A.: No, la Mimamzi (lit.: No, the Mimamzi)
Table 1 shows the structure of the eight target words with the lexically
stressed syllable as the target syllable. We varied syllable structure (open and
closed, such as 'CV.CV and 'CVC.CV), place of articulation of the consonants
(C = labial or alveolar), and foot length (2 vs. 3 syllables). Each target syllable
was preceded and followed by a minimum of one unstressed syllable to avoid
alignment variation due to time pressure or tonal crowding (Atterer and Ladd
2004, Mcke and Hermes 2007, Kleber and Rathcke 2008).
Table 1. Structure of Catalan target words.
labial alveolar
[ m i . m a . m i] [ n i . n a . n i]
open
[ m i . m a . m i . l a] [ n i . n a . n i . l a]
[ m i . m a m. z i] [ n i . n a n. m i]
closed
[ m i . m a m. z i . l a] [ n i . n a n. m i . l a]
2.1.2. Viennese German speech materials

For Viennese German, all target words were placed in contrastive focus since
it has been shown for different German varieties that nuclear contrastive accents
in declaratives are more likely to involve LH rises (Baumann, Grice, and
Steindamm 2006) than non-contrastive ones. Furthermore, they are followed
by a low boundary tone. Therefore question-answer pairs were designed in
which a contrast is forced on the target word as in the answer in (3).
(3) Q.: Hat sie die Mammi oder die Nanni bestohlen?
(lit.: Has she the Mammi or the Nanni robbed?)
A.: Sie hat die Mammi bestohlen
(lit.: She has the Mammi robbed)
Four target words were constructed with the lexically stressed syllable as
the target syllable (see table 2). Analogously to the Catalan data we varied
syllable structure (open and closed syllables) and place of articulation of the
consonants (labial vs. alveolar). The phonological syllable structure was varied
by varying phonological vowel length, 'CV:CV vs. 'CVCV. In German, short
vowels do not occur in open syllables if they are stressed. Therefore, we
assume ambisyllabicity for the intervocalic C in the 'CVCV sequence (as
suggested by psycholinguistic experiments carried out by Schiller, Meyer,
and Levelt 1997, who have shown that Dutch speakers tend to close syllables
containing a short vowel).
Table 2. Structure of Viennese German target words.
labial alveolar
open [d i # m a:. m i] [d i # n a:. n i]
closed [ d i # m a m i] [ d i # n a n i]
Each target syllable was anked by a minimum of one unstressed syllable.

In German, the denite article (preceding the target syllable) can be assumed
to form a single prosodic word with the content word to its right (die Mahmi,
the Mahmi).
2.2. Speakers and recordings

For Catalan, one native (female) speaker of Central Catalan participated in
the experiment. For Viennese German (Standard Viennese Austrian), another
native (female) speaker who grew up in Vienna was recorded.
All recordings took place at the If L Phonetics laboratory in Cologne. For
kinematic and acoustic recordings a 2D Electromagnetic articulograph (Carstens
AG100) and a time-synchronised DAT recorder were used. One sensor was
placed on the vermillion border of the lower lip, one on the tongue blade
(1cm behind the tip), and one on the tongue body (4 cm behind the tip) on
the midsagittal plane for capturing the movements of consonants and vowels.
Two additional sensors on the bridge of the nose and the upper gums served as
references in order to correct for head movements during the recordings (for
further details, see Hoole and Khnert 1996). All physiological data was
recorded at 400 Hz, downsampled to 200 Hz and smoothed with a 40 Hz
low-pass lter. The acoustic signal was digitized at 44.1 kHz/16 bit. All data
was converted to SSFF format to enable annotation and analysis in the EMU
speech database system.
Target words were produced in pseudo-randomized order with ve repeti-

tions each. The target utterances (embedded in mini-dialogues) were displayed
on a monitor and the subject was instructed to read them in an appropriate way
at a normal speaking rate. A total of 120 target words went into the analysis
(Catalan: 8 target words 5 repetitions 2 focus structures, Viennese German:
4 target words 10 repetitions). Both speakers consistently realized the nuclear
words with rising pitch accents.
2.3. Analysis procedures

The following labelling criteria were used, involving the annotation of the F0
pitch contour, as well as the identication of segmental boundaries (acoustics)
and gestural landmarks (kinematics) for consonants and vowels. The acoustic
analysis evaluates the tonal alignment patterns within the autosegmental-
metrical framework (synchronisation of tonal targets with segments, see
Figure 1 and 4), while the articulatory analysis evaluates the coordination of
Tone gestures with oral constriction gestures within Articulatory Phonology.
F0 labels: For the F0 analysis, we extracted F0 values with a 7.5 ms correla-
tion window and a 3 ms frame spacing. Around the rise contour area we iden-
tied local turning points in the F0 contour by hand. We labelled a low valley
(L) at the beginning of the LH rise. At the end of the rise, we labelled a high
peak (H). For the acoustic analysis, L and H were treated as two tonal targets
in accordance with the autosegmental-metrical approach. For the gestural
analysis (articulatory phonology approach), the onset of a Tone gesture was
taken to be the point in time at which F0 begins to move in the direction of
that gesture's target (see section 1.1). That is, the autosegmental-metrical
L was dened to be the same point in time as the onset of the articulatory
phonology H tone gesture. This point in time also coincided, in the materials
we are examining, with the offset of gestural activation of the preceding Tone
gesture.
Acoustic analysis: For the acoustic analysis, we identied segmental boundaries
of the target word in the acoustic waveform. To do this, we displayed an oscillo-
gram and a wide-band spectrogram simultaneously. All segmental boundaries
of vowels and consonants were labelled at abrupt changes in the spectra at the
time the closure was formed or released: this was the case for the nasals, the
laterals (especially in the spectra for the intensity of higher formants) and the
fricatives (at random noise patterns in the higher frequency regions). Based on
these boundaries, we measured the temporal lags between the beginning of the
F0 rise (the L valley) and the beginning of the initial C1 segment of the tonic
syllable (Tone-C1 segment).
Articulatory analysis: We identied articulatory labels for movements in the
vertical position time function (of the respective sensors: lower lip for /m/,
tongue tip for /n/, and tongue body for the vowel. Algorithmically, we identi-
ed the time of onset and effective target achievement of consonant and vowel
gestures (and also offset for consonant gestures) at zero-crossings in the
velocity curve. Based on these algorithmically-determined time points, we
measured temporal lags between the tone gestures and the oral constriction
gestures (V and C gestures) using the onsets of gestural activation, which are
the time points when gestures begin to move toward its target. The labels V
and C gesture are used to refer to the onsets of the vowel and the initial con-
sonant gestures.
For both acoustic and articulatory landmarks, the temporal lag between
pairs landmarks is reported as A-B. Thus, a negative value implies that A occurs
earlier than B, and vice versa for a positive value.
3. Results
Section 3.1 reports the acoustic and articulatory alignment patterns for the
nuclear LH rises in Catalan, and section 3.2 reports on the same for Viennese
German. We included all stimuli into the statistical analysis (acoustic and
articulation).
3.1. Acoustic and articulatory alignment patterns in Catalan

All alignment latencies (means in ms and standard deviations in parentheses)
are reported in table 3 for all Catalan data. In addition, alignment latencies for
the labial data are provided graphically in Figure 6.
First, we present the acoustic alignment results. We calculated temporal
lags between the L tone and the beginning of the initial C1 segment in the
lexically stressed syllable (Tone-C1 segment, table 3, acoustic results). In
Catalan, the L tone occurs before C1 (e.g. by 28 ms before C1 in the labial
data, and by 33 ms in the alveolar data). Figure 6a provides the corresponding
mean alignment latencies (Tone-C1 segment) in the labial dataset. The zero
line (dotted line) marks the beginning of the initial C1 segment. The negative
values indicate that the L tone leads the initial C1 segment in all investigated
conditions.
Figure 6. Catalan acoustic (a) and articulatory (bd) alignment latencies in ms, bilabial
data.
For within-speaker comparison, a three-way ANOVA (2 2 2) was con-

ducted separately for the labial and alveolar dataset.2 Therefore, we included
the dependent variable Tone-C1 segment, and the independent variables Focus
Structure (contrastive/broad focus), Syllable Structure (open/closed) and Foot
Size (2 syllables/3 syllable). There were no signicant results in the ANOVA,
neither for the labial dataset (p > 0.05) nor for the alveolar dataset (p > 0.05).
The acoustic alignment was not affected by syllable structure or foot size.
In the articulatory analysis, we measured temporal lags of the same tone
point as in this acoustic analysis (onset of rise), only now lags were measured
with respect to the onsets of the C and V gestures. The onsets of the C and V
gestures with respect to each other were also measured. The lags are all very
close to 0 (within 10 ms), indicating that the gestures are triggered synchro-
2. We treated labial and alveolar datasets separately to avoid the effects of intrinsic
variation (due to different organs) in timing patterns.
Table 3. Catalan mean lags (in ms) and standard deviations in parenthesis for acoustic
(Tone-C1 segment) and articulatory alignment measures, separately for broad
and contrastive focus, all data. The articulatory measures include the lags
Tone-V gesture, Tone-C gesture and C-V gestures.
Acoustic (segments) Articulation (gestures)

Tone-C1 segment Tone-V gesture Tone-C gesture C-V gesture
broad contr broad contr broad contr broad contr
labial
['ma.mi] 37 (18) 25 (16) 7 (19) 9 (13) 4 (18) 9 (15) 3 (2) 0 (7)
open
['ma.mi.la] 27 (8) 25 (11) 7 (8) 5 (11) 8 (6) 11 (9) 1 (5) 6 (3)
['mam.zi] 31 (17) 19 (25) 7 (14) 14 (24) 1 (15) 15 (26) 6 (6) 1 (4)
closed
['mam.zi.la] 29 (9) 28 (15) 4 (12) 6 (14) 1 (10) 7 (15) 3 (3) 1 (3)
alveolar
['na.ni] 34 (6) 28 (9) 5 (7) 1 (8) 11 (8) 4 (5) 14 (7) 3 (9)
open
['na.ni.la] 29 (7) 28 (7) 2 (3) 13 (8) 16 (6) 5 (7) 14 (3) 8 (4)
['nan.mi] 35 (7) 36 (17) 5 (6) 2 (7) 8 (6) 6 (5) 13 (5) 4 (4)
closed
['nan.mi.la] 30 (8) 41 (16) 3 (9) 7 (6) 13 (8) 3 (6) 10 (3) 10 (9)
nously. The Tone gesture lagged the V gesture slightly (by 4 ms in the labial
data, Figure 6b, and by 2 ms in the alveolar data) and the C gesture by slightly
more: 6 ms in the labial data, Figure 6c, and by 8 ms in the alveolar data.
Compatibly, the C gesture led the V gesture slightly, (on 2 ms for the labial
data and 5 ms in the alveolar data). Thus, gestural onsets occur in the order
C-V-T, but the lags are tiny.
Like the acoustic analysis, we tested the articulatory measures with three-
way ANOVAs (2 2 2) conducted separately for the labial and alveolar
dataset. There were no signicant results for the labials.
However, in the alveolar dataset we found a main effect of Focus structure
on all measures: Tone-V gesture [F(1, 40) = 10.43, p < 0.01], Tone-C gesture
[F(1, 40) = 17.45, p < 0.001] and C-V gesture [F(1, 40) = 62.83, p < 0.001].
In contrastive focus compared to broad focus the tone starts 5 ms later in the
Tone-V measure, 9 ms earlier in the Tone-C measure and the V gesture starts
10 ms later in the C-V measure. Furthermore, there was also a main effect of
Foot Size on the measures Tone-V gesture [F(1, 40) = 12.96, p < 0.001] and
C-V gesture [F(1, 40) = 12.40, p < 0.01], but not on the measures Tone-C
gesture (p > 0.05). In a two-syllable foot compared to a three syllable foot
the tone starts 8 ms earlier in the Tone-V measure and the V gesture starts 7
ms earlier in the C-V measure.
Table 4 gives an overview of the effects found in the articulatory analysis
for Catalan.
Table 4. Summary of effects found for Catalan articulatory alignment measures.
bilabial alveolar
Catalan Tone-V Tone-C C-V Tone-V Tone-C C-V
Syllable structure ns ns ns ns ns ns
Foot size ns ns ns *** ns **
Focus structure ns ns ns ** *** ***
Place of articulation ns ns ns ns ns ns
To sum up, in the Catalan data, the C, V and Tone gestural onsets are very
close to being synchronous, occurring in the order of C-V-T. Furthermore, the
lags in the gestural analysis turned out to be less affected by prosodic factors
in the labial dataset compared to the alveolar dataset. However, a one-way
ANOVA on all data (labial and alveolar together) revealed no effect of Place
of Articulation on the respective measures (Tone-V gesture, p > 0.05; Tone-C
gesture, p > 0.05; C-V gesture, p > 0.05).
3.2. Acoustic and articulatory alignment patterns in Viennese German

Table 5 summarizes all alignment latencies for the Viennese German data with
means in ms and standard deviations in parentheses. Figure 7 graphically pro-
vides medians and quartiles for the labial dataset.
In the acoustic analysis (Tone-C1 segment) the L tone occurs after the
onset of the initial C1 segment (on average 71 ms in the labial condition and
60 ms in the alveolar condition). That is also reected in Figure 7a. The zero
line marks the beginning of the C1 segment, and the positive values indicate
that the L tone occurs considerably after the beginning of C1 in all conditions.
A one-way ANOVA revealed an effect of Syllable Structure (open/closed)
on the acoustic Tone-C1 segment measure in the labial dataset [F(1, 20) =
14.80, p < 0.01] as well as in the alveolar dataset [F(1, 20) = 6.12, p < 0.05].
In the labial data, the L tone occurs 21 ms later in open syllables than in
closed ones. In the alveolar data, the L tone occurs 17 ms earlier in open than
in closed syllables. Furthermore, there was an effect of Place of Articulation in
an overall ANOVA [F(1, 40) = 6.32, p < 0.05]: the L tone occurred systemat-
ically later in the labial data than in the alveolar data (11 ms).
Table 5. Viennese German mean lags (in ms) and standard deviations in parentheses
for acoustic (Tone-C1 segment) and articulatory alignment measures, con-
trastive focus, all data. The articulatory measures include Tone-V gesture,
Tone-C gesture and C-V gesture.
Acoustic (segments) Articulation (gestures)

Tone-C1 Tone-V Tone-C C-V
segment gesture gesture gesture
(contrast) (contrast) (contrast) (contrast)
labial
open ['ma:.mi] 81 (16) 144 (16) 141 (15) 3 (7)
closed ['mami] 60 (8) 122 (8) 115 (10) 7 (7)
alveolar
open ['na:.ni] 51 (19) 83 (11) 95 (10) 12 (9)
closed ['nani] 68 (11) 70 (17) 76 (17) 6 (6)
In the articulatory analysis, the Tone gesture onset occurred considerably

after the start of the oral constriction gestures (Tone-V gesture, Tone-C gesture).
For the labial set, lags were 133 ms after the V gesture (Figure 7b) and 128 ms
after the C gesture (Figure 7c). In the alveolar data, the tone occurred 86 ms
after the C gesture and 77 ms after the V gesture. However, there was still a
tight synchronisation between the oral constriction gestures in the C-V measure
(5 ms in the labial data, Figure 7d, and 9 ms in the alveolar data).
We conducted one-way ANOVAS separately for the labial and alveolar
dataset (see also table 5 for a brief overview of the effects found for Viennese
German). In both datasets, we found main effects of Syllable Structure on the
Tone-V and Tone-C measures, but not on the C-V measures (p > 0.05). In the
labial data, lags between the tone and the oral constriction gestures were
smaller in closed syllables containing a phonologically short vowel compared
to open syllables containing a phonologically long vowel (22 ms in the Tone-
V measure, [F(1, 20) = 15.14, p < 0.01], and 26 ms in the Tone-C measure
[F(1, 20) = 22.18, p < 0.001]). In the alveolar data, lags were 19 ms smaller in
closed syllables in the Tone-C measure [F(1, 20) = 8.58, p < 0.01]), while in
the Tone-V no effect of Syllable Structure was found (p > 0.05).
In an overall analysis (labial and alveolar data together), the effect of Place
of Articulation was signicant for the measures Tone-V gesture [F(1, 40) =
121.46, p < 0.001], Tone C gesture [F(1, 40) = 57.18, p < 0.001] and C-V
gesture [F(1, 40) = 35.68, p < 0.001]. Lags between the Tone and oral constric-
tion gestures were smaller in the alveolar condition compared to the labials (by
Figure 7. Viennese German acoustic (a) and articulatory (bd) alignment latencies in
ms, bilabial data.
57 ms in the Tone-V measure, and by 43 ms in the Tone-C measure). In the C-V

measure the vowel starts 16 ms earlier in the alveolar condition.
Table 6 gives a brief overview of the effects found in the Viennese German
analysis.
Table 6. Summary of effects found for Viennese German articulatory alignment

measures.
bilabial alveolar
Viennese German Tone-V Tone-C C-V Tone-V Tone-C C-V
Syllable structure ** *** ns ns ** ns
Place of articulation ** *** *** ** *** ***
To sum up, Tone gestures are substantially delayed in Viennese German, on

the order of 100 ms, while the C and V gestures remain synchronized, as in
Catalan.
4. Discussion of Catalan and Viennese German
First, we compare the acoustic alignment patterns in Catalan and Viennese

German. Figure 8 summarises the main ndings for the target foot [ma.mi] in
Catalan (left) and [ma:.mi] in Viennese German (right). The gure is to scale
and based on statistical means. The lexically stressed syllable is shaded grey.
Figure 8. Schematic acoustic alignment patterns for nuclear LH rises, contrastive

focus, in Catalan (left) and Viennese German (right).
In both languages, L occurs in the vicinity of the C1 segment. In Catalan,

L occurs on average just before the beginning of C1, whereas in Viennese
German L occurs after it. The alignment difference amounts to 95 ms averaged
across all data (112 ms in the Ma(h)mi cases in Figure 8). The consistency
of the tonal alignment in Catalan (across variations in syllable structure and
accent type) support the segmental anchor hypothesis proposed by Arvaniti,
Ladd, and Mennen (1998) that F0 landmarks are aligned with segments. The
observed differences between Catalan and Viennese German further support
the hypothesis that alignment properties can vary across languages (among
others Arvaniti, Ladd, and Mennen for Greek 1998, Prieto and Torreira 2007a
for Spanish, Ladd et al. 1999 for English, Ladd, Mennen, and Schepman
2000 for Dutch, DImperio, Petrone, and Nguyen 2007 for Neapolitan Italian)
as well as across different dialects of a language (Atterer and Ladd 2004,
Braun 2007, Mcke et al. 2008a,b, Kleber and Rathcke 2008, Mcke et al.
2009 for Northern, Southern and Eastern German varieties). However, it is
difcult to account for these alignment differences phonologically, i.e. within
autosegmental-metrical theory, since they involve the assumption of different
acoustic anchor types such as the left edge of the onset consonant in the
lexically stressed syllable for Catalan versus the middle of the onset con-
sonant in the lexically stressed syllable for Viennese German. Therefore,
those alignment differences should not be regarded as reecting differences in
phonological association; they should be seen as phonetic detail (Atterer and
Ladd 2004; Ladd 2008). In accordance with this view, our results can only
point to the same segmental L anchor in both languages (the initial C of the
lexically stressed syllable) with the alignment differences between Viennese

German and Catalan (later alignment in Viennese German) being phonetic or
gradient in nature. It further raises the question of just how the alignment is
being controlled by speakers. Surely what is being controlled is not a matter
of the number of ms following the onset consonant. Note, too, that the H
tone target in Viennese German is not reached during the accented vowel,
and in some cases does not even occur in the stressed syllable at all. These
issues begin to make sense when we consider alignment from the perspective
that speakers coordinate the onset of gestures using simple coupling modes.
The patterns of gestural timing in Catalan and Viennese German can be
visualized with the aid of the gestural scores in Figures 9 and 10. They represent
the activation interval of the gestures estimated from the onset and target of
the gestural action. From top to bottom the gure displays the activation of
the Tone gestures L and H, the labial closure gesture for [m], and the tongue
body constriction for the vowels: pharyngeal wide [a] and palatal narrow [i].
The dotted lines for the L gestures indicate that the start of the L tone gesture
in the LH pitch accents cannot be estimated from this data.
Gestural scores for Catalan show for the production of the lexically stressed
syllable that the onsets of the consonantal (lab clo), vocalic (phar wide) and
tone (H) gestures all coincide. For both broad and contrastive focus, the onset
of the H gesture is synchronous (within +/15 ms threshold) with the onset
of the constriction gestures for the accented V and the initial C during the
lexically stressed syllable. This pattern can be explained by an in-phase
coordination of Tone and oral gestures.
C, V, and T are initiated synchronously. (The gestures are initiated in the
order C-V-T, but the lags are tiny, <5 ms).
Figure 9. Gestural score for Catalan ['ma.mi], broad focus. The gure is to scale and
based on means (for 10 tokens).
Figure 10. Gestural score for Viennese German ['ma:.mi], contrastive focus. The
gure is to scale and based on means (for 10 tokens).
On the other hand, results for Viennese German contrastive focus show that
the onset of the H tone gesture is delayed with respect to the V gesture (by
105 ms) and the C gesture (by 107 ms), illustrated in the gestural score in
Figure 10. However, the oral constriction gestures for C and V are still syn-
chronous (by 2 ms difference for the CV lag across all data). Only the Tone
gesture starts later.
To account for this difference between Catalan and Viennese German, we
hypothesized the distinct coupling graphs shown in Figure 11, and tested them
using the Haskins Laboratories task-dynamic speech production model (aka
TaDA, Nam et al. 2004). The graphs were input to the model, and different
gestural scores of Catalan and Viennese German were successfully generated,
showing the much later onset lag for Viennese German.
Figure 11. Gestural score and coupling graphs for Catalan (a) and Viennese German
(b); coupling graphs with in-phase (solid lines) and anti-phase (dotted lines)
target specications.
As shown in Figure 11, L and H are sequentially ordered and thus coupled
in an anti-phase relation (dotted line) for both languages, Catalan and Viennese
German. The difference across the two languages lies in the coupling relation
between the tones and the V gestures.
In Catalan (Figure 11a), the H tone gesture is coupled in-phase with the
accented V gesture (see solid line). L is not directly coupled to the V and starts
at some point within the pretonic syllable. The vowel and the H gesture there-
fore begin simultaneously.
In Viennese German (11b) both tones, L and H, are in-phase with the
accented V, although they are of course sequenced with respect to each other.
This competitive coupling results in a rightward shift of the H gesture to make
room for the preceding L gesture rather like the competitive coupling, Figure
3a, in which case the consonant shifts to the right to make room for the addi-
tional consonant (see Browman and Goldstein 1989, Browman and Goldstein
2000, Nam and Saltzman 2003, Nam 2007, Goldstein et al. 2009, Marin and
Pouplier 2008, see also Hermes et al., this volume).
Thus, we can provide a principled coupling account of the differences
across the two languages. This also allows us to see how the timing of the H
gesture is controlled in Viennese German. It is not synchronized with some
arbitrary time point, but rather its delay follows automatically from the com-
petitive topology of its graph.
However, in the autosegmental-metrical theory it is also possible to assume
that the rising LH pitch accent in Catalan has no leading (L) tone at all, and
simply analyse it as an H*. That kind of analysis would also involve a non-
competitive structure for the coupling of the H tone gesture with the vowel.
For Viennese German (the later rise), it would be possible to assume an L*H
instead of an LH* to account for the later alignment. But in German there
is no clear evidence for a categorical difference between L*H and LH* (see
discussion in Braun and Ladd 2003, Braun 2007).
It is interesting to note the similarities between the proposed coupling
graphs differences and the autosegmental association diagrams proposed by
Grice (1995), which treat bitonal pitch accents as sequences (Figure 12a) or
units (12b), analogously to consonant clusters or affricates respectively in the
segmental domain (Yip 1989).
The coordination of pitch accent tones in Viennese German and Catalan
(and resulting coupling graphs) differs in important ways from the lexical
tones in Mandarin as analyzed in Gao (2009). First consider Catalan vs.
Mandarin. In Mandarin, syllables with H or L tones are produced with the
substantial (~50 ms) lags between C and V onsets and then between V and T
onsets. In Catalan, the C,V, and H gestures all begin synchronously. One inter-
Figure 12. (a) Cluster of 2 bitonal pitch accents with 2 tonal root nodes, (b) unit with a
branching tonal root node (Grice 1995).
pretation of these results (consistent with the different coupling graphs we

have hypothesized) is as follows: when a Tone gesture is added to a syllable
in Catalan it produces no effect on the C-V coordination, while in Mandarin,
a Tone gesture causes the C to precede the V, because of the competitive
syllable structure represented in Fig. 3a and 5. (Though of course, it would
be useful to have a control condition in Catalan measuring the C-V coordina-
tion in unaccented syllables; nonetheless, we also expect C-V synchrony in
those syllables, based on results from other languages, e.g. Lfqvist and
Gracco (1999). Thus, the lexical Tone gestures of Mandarin appear to be fully
integrated into the network of coupling relations of the syllable, and function
as consonant gestures, while the prosodic (post-lexical) Tone gestures in
Catalan appear to be coupled to syllables (or V gestures), without perturbing
the within-syllable coupling relations.
The results from Viennese German further highlight possible differences
between pitch accent tones and lexical tones. Here, because of a competitive
coupling between a tone sequence (L-H) and a V, there is a substantial delay
of the H tone gesture onset with respect to the V onset. Importantly, however,
this appears to not affect C-V timing, which remains synchronous. Again, the
coupling relations between pitch accent tones and syllables (V gestures) do
not appear to modulate the within-syllable coupling relations, unlike what is
seen in Mandarin.
To sum up, we have shown that asymmetry of tonal pattern between
Catalan and Viennese German (later rises in Viennese German) can be
captured by the different modes of coupling Tone gestures to accented vowel
gestures. In addition, we have discovered a possible difference between lexical
and prosodic (pitch accent) tones in how they interact with the syllable-level
coupling graphs. Much more data from more speakers and more languages
will of course be required to substantiate this idea.
Acknowledgements
The Catalan recordings were carried out in collaboration with Pilar Prieto,
ICREA-University Pompeu Fabra, Barcelona, Spain.
References
Atterer, M. and Ladd, D. R.

2004 On the phonetics and phonology of segmental anchoring of F0:
evidence from German. Journal of Phonetics 32(2), 177197.
Arvaniti, A., Ladd, D. R. and Mennen, I.
1998 Stability of tonal alignment: the case of Greek prenuclear accents.
Journal of Phonetics 26, 325.
Baumann, S., Grice, M. and Steindamm, S.
2006 Prosodic marking of focus domainscategorical or gradient? In R.
Hoffman and Mixdorff (eds.), Proceedings of the 3rd conference on
Speech Prosody 2006, Dresden: TUD press, 301304.
Bombien, L., Mooshammer, C., Hoole, P. and Khnert, B.
2010 Prosodic and segmental effects on EPG contact patterns of word-
initial German clusters. Journal of Phonetics 38(3), 388403.
Braun, B.
2007 Effects of dialect and context in the realisation of German prenuclear
accents. Proceedings of the 16th International Congress of Phonetic
Sciences, Saarbrcken, 961964.
Braun, B. and Ladd, D. R.
2003 Prosodic correlates of contrastive and non-contrastive themes in
German. In: Proceedings of the 8th European Conference on Speech
Communication and Technology, Geneva, Switzerland, 789792.
Browman, C. P. and Goldstein, L.
1988 Some Notes on Syllable Structure in Articulatory Phonology. Pho-
netica 45, 140155.
Browman, C. P., and Goldstein, L.
1989 Articulatory gestures as phonological units. Phonology 6, 201251.
Browman, C. P. and L. Goldstein
2000 Competing Constraints on Intergestural Coordination and Self-
Organization of Phonological Structures. Bulletin de la Communica-
tion Parlee 5, 2534.
Byrd, D.
1994 Articulatory Timing in English Consonant Sequences. University of
California Working Papers in Phonetics, 86, ix196.
Byrd, D.
1995 C-centers revisited. Phonetica 52, 285306.
Byrd, D.
1996a A Phase Window Framework for Articulatory Timing. Phonology
13(2), 139169.
Byrd, D.
1996b Inuences on Articulatory Timing in Consonant Sequences. Journal
of Phonetics 24(2), 209244.
Cho, T.
2001 Effects of morpheme boundaries on intergestural timing: Evidence
from Korean. Phonetica 58(3), 129162.
DImperio, M.
2002 Language-specic and universal constraints on tonal alignment: The
nature of targets and anchors. Proceedings of the 1st international
conference on Speech Prosody, Aix-en-Provence, France, 101106.
DImperio, M., Petrone, C. and Nguyen, N.
2007 Effects of tonal alignment on lexical identication in Italian. In C.
Gussenhoven and T. Riad (eds.), Tones and tunes, Vol. 2, Berlin:
Mouton de Gruyter, 79106.
Gao, M.
2009 Gestural Coordination among Vowel, Consonant and Tone Gestures
in Mandarin Chinese. Chinese Journal of Phonetics. Beijing: Com-
mercial Press.
Goldstein, L., Chitoran, I. and Selkirk, E.
2007 Syllable structure as coupled oscillator modes: evidence from
Georgian vs. Tashlhiyt Berber. In: Proceedings of the 16th Interna-
tional Congress of Phonetic Sciences, Saarbrcken, Germany, 241
244.
Goldstein, L., Nam, H., Saltzman, E. and Chitoran, I.
2009 Coupled oscillator planning model of speech timing and syllable
structure. In G. Fant, H. Fujisaki and J. Shen (eds.), Frontiers in
Phonetics and Speech Science, Beijing: The Commercial Press,
239250.
Grice, M.
1995 Leading tones and downstep in English, Phonology 12(2), 183233.
Hermes, A., Grice, M., Mcke, D. and Niemann, H.
2008 Articulatory indicators of syllable afliation in word initial con-
sonant clusters in Italian. In Proceedings of the 8th International
Seminar on Speech Production, Strasbourg, France, 433436.
Honorof, D., and Browman, C.
1995 The center or edge: How are consonant clusters organized with
respect to the vowel? In K. Elenius and P. Branderud (eds.), Pro-
ceedings of the 13th International Congress of Phonetic Sciences,
Stockholm: KTH and Stockholm University, 552555.
Hoole, P. and Khnert, B.

1996 Tongue-jaw coordination in German vowel production. Proceedings
of the 1st ESCA tutorial and research workshop on Speech Produc-
tion Modelling/4th Speech Production Seminar, Autrans, 1996, 97
100.
Kleber, F. and Rathcke, T.
2008 More on the segmental anchoring of prenuclear rises: Evidence
from East Middle German. Proceedings of the 4th International
conference on Speech Prosody 2008, Campinas, Brazil, 583586.
Ladd, D. R., Faulkner, D., Faulkner, H., and Schepman, A.
1999 Constant segmental anchoring of F0 movements under changes
in speech rate. Journal of the Acoustical Society of America 106,
15431554.
Ladd, D. R., Mennen, I. and Schepman, A.
2000 Phonological conditioning of peak alignment of rising pitch accents
in Dutch. Journal of the Acoustical Society of America 107, 2685
2696.
Ladd, D. R.
2008 Intonational phonology. (2nd Edition) Cambridge: Cambridge Uni-
versity Press.
Lfqvist, A. and Gracco, V.
1999 Interarticulator programming in VCV sequences: lip and tongue
movements. Journal of the Acoustic Society of America 105, 1854
1876.
2008 Organization of Complex Onset and Codas in American English:
Evidence for a Competitive Coupling Model. Proceedings of the 8th
International Seminar on Speech Production, Strasbourg, France,
437440.
English: Testing the predictions of a gestural coupling model.
Journal of Motor Control 14(3), 380407.
Mcke, D. and Hermes, A.
2007 Phrase Boundaries and Peak Alignment: An Acoustic and Articu-
latory Study. Proceedings of the 16th International Congress of
Phonetic Sciences, Saarbrcken, Germany. 9971000.
Mcke, D., Grice, M. and Hermes, A.
2008a The vowel triggers the tone: Evidence from German. Proceedings of
PCC2008, Beijing, China.
Mcke, D., Grice, M., Hermes, A. and Becker, J.
2008b Prenuclear Rises in Northern and Southern German. Proceedings
of the 4th Conference on Speech Prosody 2008, Campinas, Brasil.
245248.
Mcke, D., Grice, M., Becker, J. and Hermes, A.

2009 Sources of variation in tonal alignment: evidence from acoustic and
kinematic data. Journal of Phonetics 37(3), 321338.
Nam, H. and Saltzman, E.
2003 A competitive, coupled oscillator of syllable structure. Proceedings
of the 12th International Congress of Phonetic Sciences, Barcelona,
Spain, 22532256.
Nam, H., Goldstein, L., Saltzman, E. and Byrd, D.
2004 TADA: An enhanced, portable Task Dynamics model in MATLAB.
Paper presented at the 147th Meeting of the Acoustical Society of
America, New York. May 2428.
Nam, H.
2007 Syllable-level intergestural timing model: Split-gesture dynamics
focusing on positional asymmetry and moraic structure. In J. Cole
and J. I. Hualde (eds.), Laboratory Phonology 9, Berlin, New York:
Walter de Gruyter, 483506.
Nam, H., Goldstein, L. and Saltzman, E.
2009 Self-organization of syllable structure: a coupled oscillator model.
In F. Pellegrino, E. Marisco and I. Chitoran, (eds.). Approaches to
phonological complexity. Berlin, New York: Mouton de Gruyter,
299328.
Prieto, P. and Torreira, F.
2007a The segmental anchoring hypothesis revisited. Syllable structure and
speech rate effects on peak timing in Spanish. Journal of Phonetics
35(4), 473500.
Prieto, P., Mcke, D., Becker, J. and Grice, M.
2007b Coordination patterns between pitch movements and oral gestures in
Catalan. Proceedings of the 16th International Congress of Phonetic
Sciences, Saarbrcken, Germany, 989992.
Saltzman, E., and Byrd, D.
2000 Task-dynamics of gestural timing: Phase windows and multi-
frequency rhythms. Human Movement Science 19(4), 499526.
Schiller, N. O., Meyer, A. S. and Levelt, W. J. M.
1997 The syllabic structure of spoken words: Evidence from the syllabi-
cation of intervocalic consonants. Language and Speech 40(2), 103
140.
Shaw, J.A., Gafos, A., Hoole, P. and Zeroual, C.
2009 Syllabication in Moroccan Arabic: evidence from patterns of
temporal stability in articulation. Phonology 26, 187215.
Turvey, M. T.
1990 Coordination. American Psychologist 45(8), 938953.
Yip, M.
1989 Contour tones. Phonology 6, 149174.
Tonogenesis in Lhasa Tibetan Towards a
gestural account
Fang Hu
Abstract
This paper proposes how laryngeal complexity, the tone, could emerge from sequential
complexity, consonant clusters, by examining tonogenesis in Lhasa Tibetan on the
basis of the articulatory and acoustic data recorded by Electromagnetic Articulograph
(EMA, the Carstens AG500 system) from three native speakers. The acoustic data con-
rmed the high-low contrast of tones in Lhasa on the one hand and a high correlation
between tonal contours and syllable types on the other. In other words, the high-low
contrast emerged earlier than contour contrast in Lhasa tonogenesis, which is different
to the classical Vietnamese case (Haudricourt 1954) and Chinese case (Pulleyblank
1962). The intergestural timing revealed a C-center organization for the Lhasa syllable
production, namely the vowel gesture begins around the midpoint between the con-
sonant gesture and tone gesture. That is, the tone gesture is coordinated like an additional
consonant to the CV production. Results suggest that Lhasa tonogenesis followed
general coupling principles in syllable production (Nam, Goldstein and Saltzman
2010), and in the long-term historical development, the competitive coupling relations
initiated the simplication process for Lhasa consonant clusters, and nally the tone
gesture emerged as an integrated component of syllable production.
1. Introduction
The emergence of contrastive tones, or tonogenesis (Matisoff 1973), is widely

attested in a number of languages in the world. It is especially well documented
in Sino-Tibetan and other East and Southeast Asian languages (Maspero 1912;
Karlgren 191526; Haudricourt 1954; Matisoff 1973; Li 1977; Mazaudon
1977). Interestingly, it is generally observed that non-tonal languages are asso-
ciated with a complex syllable structure, typically characterized by consonant
clusters, whereas tone languages are associated with a much simpler syllable
structure, typically no consonant clusters or extremely limited in both pre-
vocalic and postvocalic positions. In a range of well-documented cases, tone
emerges as a compensation for the simplication of syllable structure (Matisoff
1999). Specically, as proposed in Haudricourts (1954) classical work on
Vietnamese, tonogenesis can be explained as a compensatory mechanism for
232 Fang Hu
the simplication of syllable initials and rimes. First, tonal contours emerged
from different rime types, e.g., level tone from open syllable, falling tone from
aspirated syllable, and rising tone from checked syllable. Second, high vs. low
register contrasts further developed from the loss of voicing distinction in
syllable initials. These two basic mechanisms were generally accepted in the
eld of historical linguistics in accounting for the tonogenesis in Sino-Tibetan
languages (e.g., Pulleyblank 1962; Mei 1970). And phonetic research, in
general, demonstrated that these mechanisms are supported by empirical data
(Hombert, Ohala and Ewan 1979). According to Hombert, Ohala and Ewan
(1979), a number of segmental effects, such as initial voicing, postvocalic
glottal stop or fricative, phonation etc., quite naturally, have a perturbation
effect of fundamental frequency (F0) on the adjacent vowels within a syllable
in both tonal and non-tonal languages. And, tone emerges when an intrinsic
(F0) perturbation comes to be used extrinsically (p. 37).
Thus, a key issue of the inquiry into tonogenesis is to explain how an
intrinsic F0 perturbation in a non-tonal language evolves into an extrinsic
linguistic contrast in tonal languages. Both non-tonal and tonal languages
share a commonality that F0 is rst of all a global intonational function of
sentence production, but they differ in that F0 is additionally localized in the
syllable production in tone languages. If F0 perturbation, which is riding
on the global sentence intonation in a non-tonal language, emerges as a local
event, i.e. linguistic tone, the production of F0 should be bound to the produc-
tion of the syllable. On the other hand, tonogenesis is accompanied by the sim-
plication of syllable structure. While syllables are becoming sequentially
simpler with the loss of consonant clusters, syllable production is featured by
a new structural complexity in tone languages, namely a laryngeal gesture is
simultaneously superimposed upon supralaryngeal articulations. The question
is how. Current phonology adopts an autosegmental view on the relation
between laryngeal and supralaryngeal articulations. That is, tone and segments
are parallel to each other, and they are associated by lines in an abstract
fashion. In the research line of phonetics, however, the temporal alignment
between tone and segments demonstrates stable, concrete patterns both in
tone languages (Xu 1998, 1999, 2005) and in non-tonal languages (DImperio
et al. 2007; Mcke et al. 2009). Explicitly, Articulatory Phonology (Browman
and Goldstein 1986, 1988, 1992) looks into the coordination structure between
individual articulations. Each individual articulation, or gesture, is an action
unit which involves a formation and release of a particular constriction in the
vocal tract. Unlike traditional phonological concepts, which are claimed to
be autonomous or linguistic internal, gestures in articulatory phonology follow
Tonogenesis in Lhasa Tibetan Towards a gestural account 233
Figure 1. The C-center organization of syllable production.
general principles of kinematics and dynamics, and can thus be dened as

a task dynamic system with temporal and spatial properties (Saltzman 1986;
Saltzman & Kelso 1987; Saltzman & Munhall 1989). Specically, it is ob-
served that speech gestures exhibit stable timing relations if they are in-phase
coupled, i.e. synchronized, or they are anti-phase coupled, i.e. sequential with
each other (Browman and Goldstein 1988, 2000; Goldstein et al., 2006). As
proposed in Browman and Goldstein (1988), syllable production follows these
two general principles: (1) the prevocalic consonants are in-phase coupled, i.e.
synchronized with the following vowel; (2) individual prevocalic consonants
are anti-phase coupled, i.e. sequentially coordinated with each other. Thus,
in a C1C2V syllable, for instance, the production of C1 will be shifted earlier
relative to the production of V and the production of C2 be shifted later rela-
tive to the production of V due to the competing phasing specication between
C-C coupling and C-V coupling (Browman and Goldstein 2000). As schema-
tized in Figure 1, the onset of the V gesture coincides roughly with the mid
point of the two C gestures. The C-center effect, therefore, serves as an
important measure in characterizing syllable structure (Browman and Goldstein
1988, 2000). Gao (2008, 2009) examined the gestural coordination in syllable
production in Mandarin Chinese, and found that Mandarin CV syllables demon-
strate a C-center-like organization among the C, V, and tone gestures. The nd-
ing that the tone is aligned to the syllable like an additional consonant sheds
new light on the explanation of tonogenesis, namely laryngeal complexity could
emerge from sequential complexity, as they share commonalities in gestural
planning.
234 Fang Hu
Tibetan languages demonstrate a clear routine for how laryngeal com-

plexity, the tone, emerges from sequential complexity, consonant clusters. As
manifested in the Tibetan script dating from the 7th century, Old Tibetan is
a typical non-tonal language with an inventory of about 220 consonant
clusters. Modern Tibetan languages, however, exhibit a variegated scenario of
tonogenesis (Sun 2003). Along with the simplication of consonant clusters,
Tibetan languages constitute a tonality continuum from completely non-tonal
to highly tonal such that there is no clear dichotomy between a tonal and
non-tonal language (Huang 1994; Sun 1997).
Lhasa Tibetan occurs as a well developed tone language. However, its
tonology remains controversial. Scholars agreed that Lhasa Tibetan has the
high vs. low tone contrast but disagreed in how many contour contrasts Lhasa
Tibetan has (see Qu 1981, Sun 1997 for a brief review). Diachronically, Lhasa
high tone is associated with historical voiceless initials, and low tone with
historical voiced obstruents. Additionally, syllables with historical prexed
sonorant initials became high toned, whereas syllables with unprexed sonorant
initials became low toned. On the other hand, Lhasa tone contours are bound
to (historical) syllable types. As a result, the complementary distribution of
pitch patterns leads to different phonological interpretations. Meanwhile this
also implies that Lhasa tonology is still under development. In this study,
tone production is examined under syllable production in general. The Electro-
magnetic Articulograph (EMA, the Carstens AG500 system) was employed
to monitor the kinematic aspects of relevant articulators during Lhasa syllable
production. The EMA data provide us with a straightforward way to measure
lip and tongue kinematics, which can further be directly interpreted as consonant
and vowel gestures for controlled laboratory speech. It is thus expected that
ne-grained phonetic details on the inter-gestural coordination between the
consonant, vowel and tone production would shed new light on tonogenesis
in Lhasa Tibetan.
2. Methodology
As a pilot study of a long-term project, this study focuses on the monosyllabic

citation syllables and tones. Lhasa Tibetan has four syllable types; each can be
associated with a high or a low tone. In order to have a better observation on
the inter-gestural coordination in a natural conversational condition, meaning-
ful monosyllabic words or morphemes with a labial initial consonant [p m]
and a low or mid-low vowel [a ] in all the eight possible syllable type and
tone combinations were used as test syllables, as listed below from 1) to 8).
(1) high toned long syllables (CVS(sonorant))1: [pl] abundance, auspicious-

ness [par] photo [pa] lawn [ml] Mrigasira [mar] red [ma]
base
(2) low toned long syllables (CVS): [pl] wool [par] to burn [pam] to
be strong (future tense)[ml] residence [mar] Tibetan ghee [ma]
many
(3) high toned aspirated syllables (CVh)2: [pah] brave [mah] low
(4) low toned aspirated syllables (CVh): [pah] bine [mah] mother
(5) high toned checked syllables (CV): [pa] to speculate (past tense)
[ma] battle
(6) low toned checked syllables (CV): [pa] to contaminate [ma] rst
syllable for son-in-law
(7) high toned checked syllables (CVN)3: [pa] height [ma]
plebeian
(8) low toned checked syllables (CVN): [pam] to be strong (past tense)
The test syllables, written in Tibetan script, were presented to the speaker in
a random order on an LCD monitor. Each target syllable was embedded in a
carrier frame: X, ji keTibetan Character tithis X sais /jinis X, this is X. 10 to 15
repetitions were recorded using the Carstens AG500 EMA system with a
1. The speakers read these citation syllables with learned pronunciation, which
differs from the colloquial form mainly in that more orthographic information is
retained in the learned pronunciation. For instance, Lhasa was reported as having
long and short open (CV) syllables, and was thus treated as being contrastive in
vowel duration in the literature (e.g., Jin ed. 1983; Qu 1981). But in the citation
monosyllables, long open syllables are normally pronounced with a liquid coda
according to the orthographic spelling.
2. The aspirated syllables were transcribed as short open syllables in the literature.
Its true that the syllable-nal aspiration diminishes or disappears if the aspirated
syllable occurs in an unstressed position in running speech. But in the citation
form, these short open syllables are clearly aspirated, i.e. pronounced with a
syllable-nal glottal fricative.
3. CVN syllables are grouped with CV syllables in the literature. Here, CVN
syllables are treated as a different syllable type since (1) they have a complex
coda, and (2) they have a longer duration than CV syllables which consequently
may affect their tonal development.
236 Fang Hu
synchronized audio recording. Three native Lhasa female speakers were re-
corded. They were all rst year or second year undergraduate students, 20 to
21 years old, in the Minzu University of China in Beijing.
The sensors were attached on the speakers articulators along the mid-
sagittal plane: two on the tongue (tip and body), two on the lips (lower and
upper lips respectively), and one on the gum ridge at the lower incisors
( jaw). Additional three sensors on the bridge of nose and behind the left and
right ears served as references to compensate for head movements. The articu-
latory data were sampled at 200 Hz and smoothed with a 12 Hz low-pass lter.
The acquired data were further corrected for head movements, and then
rotated and translated to the speakers occlusal plane.
The consonant gesture in the target syllable was characterized by lip
aperture, i.e. the calculated Euclidean distance between the lower and upper
lip sensors. The vowel gesture was characterized by the kinematics of the
tongue body sensor. The tone gesture was, however, based on the acoustics,
i.e. the fundamental frequency (F0). Due to the limitation of research techni-
ques, the periodicity of vocal folds was not directly measured in this study.
Alternatively, its acoustic output, F0, was taken as a measure of tone gesture.
Following Gao (2008, 2009; see also Mcke et al. this volume), the preceding
F0 minimum was taken as the onset of a high tone, and the preceding F0
maximum as the onset of a low tone. Physically, laryngeal periodicity is only
observed on the voiced segments. As a result, syllables with a voiceless vs.
voiced initial show an inconsistency. For instance, F0 is observed for the
whole syllable in [mar], but for the rime part only in [par]. In this study, the
tone gesture was measured from a sentential F0 event, and F0 is viewed as
being virtually connected during the production of voiceless [p]. That is, for a
high or low toned [par], for instance, the preceding F0 minimum or maximum
is measured as the onset of the target tone, respectively. And generally, the tone
onset was found around or even shortly before the onset of the target syllable
(see Figure 2 for illustration and Section 4 for detailed results). In this way,
tones on the voiceless and voiced syllables were treated consistently in this
study. In fact, this kind of treatment is in line with the traditional idea that
tone is a syllabic property (Wang 1967; Chao 1968). And furthermore, tone
as a syllable-synchronized feature is supported by the empirical tonal align-
ment data in Mandarin Chinese, a canonical tone language (Xu 2005).
Figure 2 illustrates the acoustic data labeling procedure and Figure 3 illus-
trates the articulatory data labeling procedure for the same high toned p-initial
syllable [par], respectively. The annotations consist of three acoustic levels,
syllable, tone, and target (acoustically dened tone onset), and two articulatory
levels, lip aperture (LA) and tongue body (TB).
Figure 2. Acoustic labeling for [par] in the citation position and sentence-mid
position. Levels of annotation (upper to lower): syllable, target, tone;
signal windows (upper to lower): audio, wideband spectrogram, F0.
Figure 3. Articulatory labeling for [par] in the sentence-mid position. Levels of

annotation (upper to lower): syllable, lip aperture (LA), and tongue body
(TB); signal windows (upper to lower): audio, LA tangential velocity, LA
position, TB position, TB tangential velocity.
238 Fang Hu
The label of syllable delimits the whole syllable, i.e., both consonantal initial
and rime. As shown in Figure 2, the label of syllable [par] includes the rime and
the VOT of the initial consonant for the target syllable in citation position, i.e.,
the rst X position in the carrier sentence; and the label of syllable [par]
includes the rime, the VOT, and the acoustic closure part of the initial con-
sonant for target syllables in the sentence-mid position, i.e., the second X
position in the carrier sentence. The label of tone delimits the periodical rime
part in a syllable. Thus, the interval between the onset of syllable and tone
denes the consonant duration. The target syllable in citation position was
labeled acoustically only and the discussion of the acoustic properties of
Lhasa tones in Section 3 is based on these annotated tone segments in cita-
tion positions such that both p-initial syllables and m-initial syllables have
comparable F0 contours and durations. As mentioned above, F0 contours are
considered as being virtually connected across the production of the voiceless
[p]. And the F0 minimum that precedes the target high tone, which is located
around the offset of the preceding syllable [ti], was thus dened as the onset of
the target high tone gesture.
Articulatory annotations apply to the target syllable in the sentence-mid
position. The consonant gesture for the bilabial [p] or [m] was dened by lip
aperture (LA), which is composed of a gesture of lip closing and lip opening.
The production of the vowel [a] or [] was characterized by a lowering gesture
of tongue body (TB). The annotations were based on the positional data with
reference to the criterion of tangential velocity minimum. As shown in Figure
3, from the LA positional peak to its rst valley was labeled as the gesture of
lip closing (close), and accordingly from the valley to the following peak was
labeled as the gesture of lip opening (open). And as shown in the gure, peaks
and valleys occur where there are tangential velocity minima. Similarly, the
lingual lowering gesture (lower) was labeled from a stable higher TB position
to a stable lower TB position where there are tangential velocity minima.
3. Results and discussion: acoustic properties
Figure 4 gives the mean F0 contours associated with the eight different syllable
type and tone combinations in Lhasa Tibetan from the three female speakers.
The F0 contours were averaged for each combination in the citation position
across all the repetitions of all the tested syllables (refer to 1) to 8) in section
2 for details).
As summarized in Table 1, the F0 contour patterns are quite consistent
across all the three speakers.
Figure 4. Lhasa tones. High CVS: H; low CVS: LH; high CVh: HS; low CVh: LHH;
high CV: HLS; low CV: LHS; high CVN: HL; low CVN: LHL.
First, there is a clear high vs. low tonal contrast. Acoustically this feature is
manifested on the onset part of the tone. The high tones have a high F0 onset
at around 270320 Hz and the low tones a low F0 onset at around 190240
Hz. Second, tonal contours are highly correlated with syllable types. It has
been debated in the literature whether Lhasa has two, four, or six tones. Its
quite clear from the acoustic data presented here that the complementary dis-
240 Fang Hu
Table 1. Syllable types and the emergent tonal melodies in Lhasa Tibetan.
tonal category syllable type F0 contour label

high long level H
CVS
low long rising LH
high short level HS
CVh
low short rising LHH
high short falling HLS
CV
low short rising-falling LHS
high short(?) falling HL
CVN
low short(?) rising-falling LHL
tribution of F0 contours leaves room for different phonological interpretations.

A two-tone analysis emphasizes that the high-low contrast, or the so-called
register contrast, is the only phonological distinction for Lhasa tones (Sprigg
1954, 1990). A four-tone analysis further treats the difference in quantity or
glottal stop as phonological contrast (see Sun 1997: 491492 for a review).
And a six-tone analysis treats both quantity and glottal stop differences as
phonological (T. Hu, Qu and Lin 1982).
The Lhasa tonogenesis can be well explained by the intrinsic F0 perturba-
tion theory (Hombert, Ohala and Ewan 1979). Historically, the high-low con-
trast was induced by the loss of the voicing contrast of initial consonants.
Lhasa syllables with historical voiceless initials or prexed sonorant initials
became high-toned, and those with historical voiced obstruent initials or
plain sonorant initials became low-toned. Obviously, the historical voicing
has a lowering effect on F0 such that a low (L) tonal onset is induced. The
glottal stop also has a lowering effect on F0 such that an L tonal offset is
induced4. The syllable-nal aspiration merely has a limited lowering effect on
4. Hombert, Ohala and Ewan (1979) concluded that glottal stop has a raising effect
on F0. However, glottal stop could induce a sharp drop in F0, too (Zee and
Maddieson 1979). Moreover, as noted in Tan and Kong (1991: 17), the glottal
stop in Lhasa is actually characterized by glottalization. That is, glottal closure is
realized as creaky voice in the sense of Ladefogeds (1971) continuum of phona-
tion types (see also Gordon and Ladefoged 2001).
F0, and consequently the CVh syllable has a comparable F0 contour to its
CVS counterpart, but is much shorter. In summary, all perturbations in Lhasa
Tibetan have an F0 lowering effect. By contrast, the unperturbed F0 stays as a
high (H) tonal element. Thus, a rising F0 contour was induced by historical
voicing, a falling F0 contour was induced by the glottal stop, and a rising-
falling F0 contour was induced by both of them.
The acoustic results from this study are, in general, consistent with those
from T. Hu, Qu and Lin (1982). The only difference is that this study further
distinguishes two types of checked syllables: CVN vs. CV. An eight-tone
analysis is therefore proposed. Although CVN and CV share similar F0
contour patterns, the former is considerably longer than the latter (cf. F. Hu
and Xiong 2010). Interestingly, this durational difference has a critical con-
sequence. A sharp drop in F0 signies the presence of a glottal stop (Zee and
Maddieson 1979), and is thus redundant in nature. However, the glottal stop
is often dropped in natural conversational speech, and consequently the sharp
F0 drop effect disappears. As a result, CV syllables will have a similar F0
contour to the corresponding CVh syllables5. By contrast, a slower drop in
F0 is not a redundant feature, and the falling pitch is always attested in the
production of CVN syllables. That is, even when the glottal stop in CVN
syllables is weakened or deleted as reported in the literature (e.g. Jin ed.
1983: 13), the falling tonal contour is still retained. To sum up briey, the
emergent citation tones in Lhasa Tibetan are demonstrating a new direction
of development: while short tones on CVh and CV syllables tend to merge,
short tones on CVN are further emerging as contrastive tones (F. Hu and
Xiong 2010).
4. Results and discussion: intergestural timing
It has been shown so far that contrastive tones have emerged in Lhasa Tibetan.
Meanwhile, the emergent tonal melodies are highly constrained by syllable
structures, and are thus still under development from a historical phonological
point of view. In this section, the internal articulatory structure of Lhasa
syllable/tone production is examined.
Figures 57 show the temporal structure of intergestural coordination for
the syllable production from the three Lhasa speakers respectively. Bars in
5. As mentioned earlier, syllable-nal aspiration is usually deleted in natural con-

versational speech, too.
242 Fang Hu
Figure 5. Intergestural timing for Lhasa syllable production: Speaker 1.


244 Fang Hu

the gures denote mean durations of gestural segmentations averaged across

the repetitions of the target syllables listed below. The syllable tier denotes
the acoustic landmarks of Lhasa syllable production: acoustically segmented
[p]/[m] and rime, and the acoustically dened tone onset (t_on), which is
represented by a dotted line in the gures. For each syllable type, p-initial
syllables were separated from m-initial syllables, to check if the voicelessness
of [p] affected intergestural timing. However, high and low toned syllables
were averaged together6. Lip gesture denotes the consonant gesture: a closing
and an opening phase for the production of the bilabial consonant [p] or [m].
The lowering gesture of lingual articulation is taken as a measure of the vowel
gesture for the low vowel [a] or mid-low vowel [].
As can be seen from the gures, the three speakers exhibit a consistent
pattern regarding the intergestural timing of syllable production. First, both
lip gesture and lingual gesture begin before the [p] or [m] segment. That is, the
articulation not only of the consonant but also of the vowel begins before the
acoustic onset of syllable. Second, the tone gesture begins around or shortly
before the acoustic onset of the syllable, too. Third, the lip opening gesture
begins around the mid part of the [p] or [m] segment, and is accomplished
around the mid part of the rime. Fourth, the tongue is lowered to its lowest
position, i.e. the articulatory target for the vowel, during the rst half of the
rime in general. However, the articulatory vowel target might be realized
earlier, sometimes even around the onset of the rime, in short syllables. Also,
speakers differ in terms of the temporal relation of lingual lowering to lip
opening. The tongue is completely lowered before the lips are fully opened in
Speaker 1 and 2, while the lingual lowering and lip opening gestures are
approximately simultaneously fullled in Speaker 3.
It can also be seen from the gures that the consonant gesture, vowel gesture,
and tone gesture in Lhasa syllable production are generally consistent with the
concept of C-center-like organization. That is, the vowel gesture begins around
the midpoint between the consonant gesture and the tone gesture. Tables 2 to 4
summarized the mean durations in millisecond, with their standard deviations
in parentheses, of the consonant-to-vowel (CV) lags and vowel-to-tone (VT)
lags from the three speakers. Again, results were averaged over high-toned
6. One-way ANOVAs of the differences between consonant-to-vowel (CV) lags and

vowel-to-tone (VT) lags yielded no signicant difference between high-toned and
low-toned syllables (p > 0.05) for any of the three speakers. For potential effects
on intergestural timing from syllable type, initial consonant, and tone, please refer
to the details of the three-way ANOVAs which are given below.
246 Fang Hu
Table 2. Mean durations in millisecond and standard deviations in parentheses of the

CV and VT lags from Speaker 1.
syllable type initial CV lag VT lag p-value n

p 76 (36) 74 (25) 0.5849 86
CVS
m 80 (37) 75 (23) 0.1630 87
p 85 (34) 77 (29) 0.1975 30
CVN
m 92 (39) 70 (17) 0.0579 14
p 87 (32) 80 (31) 0.1721 29
CV
m 88 (41) 67 (22) 0.0017 30
p 86 (34) 75 (32) 0.0206 29
CVh
m 93 (34) 73 (23) 0.0004 30


p 24 (10) 26 (7) 0.4367 57
CVS
m 26 (9) 25 (8) 0.607 60
p 28 (7) 24 (8) 0.1928 18
CVN
m 29 (10) 26 (9) 0.63 9
p 30 (8) 24 (5) 0.0137 20
CV
m 31 (12) 28 (8) 0.5333 19
p 25 (10) 23 (8) 0.5806 0
CVh
m 27 (7) 27 (9) 0.8233 20
and low-toned syllables. Results from the paired t-tests, as listed by the p-
values in the last column in the tables, show that there is no signicant differ-
ence between the durations of CV lags and VT lags in most cases in Speaker 1
and 2. Two cases from Speaker 1 and one case from Speaker 2, as signied by
the shaded cells in the tables, show that the difference is signicant at the 95%
condence level. However, all cases from Speaker 3 exhibit a highly signi-


p 81 (33) 42 (26) <0.0001 58
CVS
m 74 (34) 49 (22) <0.0001 60
p 101 (50) 39 (26) 0.0001 22
CVN
m 70 (33) 39 (13) 0.0147 10
p 80 (36) 37 (14) 0.0002 20
CV
m 82 (46) 46 (27) 0.0067 20
p 70 (23) 30 (10) <0.0001 20
CVh
m 87 (33) 44 (18) <0.0001 19
cant difference between the durations of CV and VT lags, and the CV lag is
larger than the VT lag7.
The dataset presented in the present study is three-way in nature. That is,
there are four syllable types (CVS, CVN, CV, and CVh), two types of
initial consonants ([p m]), and two types of tones (high and low). To elaborate
further if syllable type, initial consonant and tone have an effect on the C-
center-like alignment, a 3-way ANOVA with repeated measures was conducted
on the difference between CV lag and VT lag for each speaker. Results from
Speaker 1 indicate no signicant effect of syllable type (F(3,320) = 0.6463,
p = 0.5858), initial consonant (F(1,320) = 0.0272, p = 0.8692), and tone
(F(1,320) = 0.0519, p = 0.8200). And there is no signicant effect of interac-
tions between syllable type and initial consonant (F(3,320) = 0.5314, p = 0.6611),
between syllable type and tone (F(3,320) = 1.0751, p = 0.3598), between ini-
tial consonant and tone (F(1,320) = 0.1604, p = 0.6890), and between syllable
type, initial consonant and tone (F(2,320) = 1.2462, p = 0.2890). Results from
Speaker 2 indicate no signicant effect of syllable type (F(3,208) = 0.7506,
p = 0.5231), initial consonant (F(1,208) = 1.0917, p = 0.2973), and tone
7. This is mainly attributable to the fact that this speaker habitually laid an extra
focus on the target syllables when reading them in carrier sentences. As shown in
Figure 7, the lip gesture of this speaker is characterized by a long closing phase.
On the other hand, in order to get an unbiased overall picture of the intergestural
coordination of Lhasa syllable production, this study does not use any threshold
in dening articulatory gestures, and thus could not control this sort of variability
induced by the speakers reading style.
248 Fang Hu
(F(1,208) = 0.0855, p = 0.7703). And there is no signicant effect of interac-

tions between syllable type and initial consonant (F(3,208) = 1.6034, p =
0.1897), between syllable type and tone (F(3,208) = 0.2495, p = 0.8616), and
between initial consonant and tone (F(1,208) = 0.1123, p = 0.7379), but there
is a signicant effect of interaction between syllable type, initial consonant and
tone (F(2,208) = 3.3392, p = 0.0374). Results from Speaker 3 indicate a signif-
icant effect of syllable type (F(3,214) = 3.7871, p = 0.0112) and initial con-
sonant (F(1,214) = 8.9015, p = 0.0032), but no effect of tone (F(1,214) =
1.4429, p = 0.2310). And there is no signicant effect of interactions between
syllable type and initial consonant (F(3,214) = 2.0283, p = 0.1109), between
syllable type and tone (F(3,214) = 0.9170, p = 0.4335), between initial con-
sonant and tone (F(1,214) = 0.0282, p = 0.8668), and between syllable type,
initial consonant and tone (F(2,214) = 0.0339, p = 0.9667). Therefore, sufce
it to conclude that the tone doesnt affect intergestural timing, and it generally
holds that the C-center-like alignment is consistent across different syllable
types and initial consonants.
The Lhasa case corroborates the results from Mandarin Chinese (Gao 2008,
2009). That is, the tone gesture behaves like an additional consonant gesture in
terms of its intergestural temporal relation to consonant and vowel gestures, as
is schematically illustrated by the coupling structure of Lhasa syllable produc-
tion in Figure 8. The fact that the tone is temporally aligned like an additional
consonant in syllable production establishes a sort of connection between seg-
ments and suprasegmentals/tones. The gesture-based account thus provides a
direct observation into phonological evolution (Goldstein, Byrd and Saltzman
2006). As mentioned earlier, the tone gesture is developed from the loss of
voicing contrast in Lhasa. Following Hombert, Ohala and Ewans (1979)
terminology, it can be concluded that the F0 perturbation induced by phonation
difference is structurally internalized in Lhasa syllable production from a dia-
chronic perspective. On the other hand, Old Tibetan is a non-tonal language
with a complex structure of initial consonants, and a number of modern Tibetan
dialects remain as non-tonal languages. The complex consonant clusters in these
non-tonal Tibetan dialects, however, have nevertheless undergone a variegated
scenario of simplications. It is reported that in non-tonal Amdo Tibetan such
as Xiahe dialect, prex initial obstruents and sonorants have merged into
glottal/uvular articulations, and more interestingly, habitual tonal melodies, or
stable F0 perturbations, have emerged from the distinction between a glottal/
uvular-prexed syllable and a plain syllable, for instance, [a] with a low tonal
melody < *[a] I vs. [ha] with a high tonal melody < *[la] ve (Jiang
2002). It is therefore tenable to hypothesize that tone gesture could emerge
from phonation and glottal/uvular articulation which is further simplied
from consonantal articulations, as denoted in the parenthesis in Figure 8.
Figure 8. Coupling structure for the consonant, vowel and tone gestures in Lhasa
Tibetan. Solid line: in-phase coupling; dotted line: anti-phase coupling.
5. Conclusion
The acoustic data conrmed the high-low contrast of tones in Lhasa on the
one hand and a high correlation between tonal contours and syllable types
on the other. That is, different to the classical Vietnamese case (Haudricourt
1954) and Chinese case (Pulleyblank 1962), the high-low contrast emerged
earlier than contour contrast in Lhasa tonogenesis. And in general, the results
are in line with the tonogenesis mechanisms proposed in Hombert, Ohala and
Ewan (1979), namely the intrinsic segmental perturbation on F0 was or is
being extrinsically used and was or is being internalized in the grammar.
Meanwhile, the Lhasa case also demonstrated language-specic mechanisms:
(1) the syllable-nal glottal stop produced a great drop, rather than rise in
F0; (2) the syllable-nal aspiration did not have much effect on F0.
The intergestural timing revealed a C-center organization for the Lhasa
syllable production, namely the vowel gesture begins approximately at the
midpoint between the consonant gesture and tone gesture. That is, the tone
gesture is coordinated like an additional consonant to the CV production. The
Lhasa case corroborates the results from Mandarin Chinese (Gao 2008, 2009),
a canonical syllable tone language, but differs from sentential pitch accents in
non-tonal languages such as Catalan and German (Mcke et al. this volume).
Unlike in tone languages where tones are lexical representations and are thus
locally integrated in the coupling relation of syllable production, sentential
pitch accents occur as a post-lexical event in non-tonal languages, namely the
alignment of the tone gesture doesnt affect the coordination structure of the
consonant and vowel gestures (Mcke et al. this volume). It seems that the
Lhasa case follows general coupling principles in syllable production (Nam
and Saltzman 2003; Nam 2007; Nam, Goldstein and Saltzman 2010), and in
the long-term historical development, the competitive coupling relations initiated
the simplication process for Lhasa consonant clusters especially in the prevo-
calic position, and nally the tone gesture emerged as an integrated component
of syllable production. Yet it should be admitted, however, that in order to have
250 Fang Hu
a clearer picture of the interaction between consonant cluster and tonogenesis,

further researche on especially non-tonal Tibetan languages are needed.
Acknowledgment
Earlier versions of this paper were presented at the workshop Consonant

Clusters and Structural Complexity (August 2008) and the Haskins Laboratories
(May 2010). I would like to thank all the audience, especially Hosung Nam,
Elliot Saltzman, Khalil Iskarous, Douglas Whalen and Louis Goldstein for
their helpful comments. I am also grateful to Jeff Berry, Lasse Bombien and
Mark Tiede for their help in EMA data processing. And I am highly indebted
to Phil Hoole, Arthur Abramson, and an anonymous reviewer for their con-
structive suggestions and comments.
References

252.
1988 Some notes on syllable structure in Articulatory Phonology. Phone-
tica 45: 140155.
1992 Articulatory Phonology: An overview. Phonetica 49: 155180.
zation of phonological structures. Bulletin de la Communication
Parle 5: 2534.
Chao, Yuen Ren
1968 A Grammar of Spoken Chinese. Berkeley, CA: University of Cali-
fornia Press.
DImperio, M., Loevenbruck, H., Menezes, C., Nguyen, N. & Welby, P.
2007 Are tones aligned to articulatory events? Evidence from Italian and
French. In Cole, Jennifer and Hualde, Jos Ignacio (eds.), Labora-
tory Phonology 9, pp. 577608. Berlin, New York: Mouton de
Gruyter.
Duanmu, San
1992 An autosegmental analysis of tone in four Tibetan languages. Lin-
guistics of the Tibeto-Burman Area 15: 126.
Gao, Man
2008 Mandarin Tones: An Articulatory Phonology Account. Ph.D. Disser-
tation, Yale Uiversity.
Gao, Man
2009 Gestural coordination among vowel, consonant and tone gestures in
Mandarin Chinese. Chinese Journal of Phonetics 2: 4350.
Goldstein, Louis, Dani Byrd and Elliot Saltzman
2006 The role of vocal tract gestural action units in understanding the
evolution of phonology. In Michael A. Arbib (ed.) Action to lan-
guage via the mirror neuron system, pp. 215249. Cambridge:
Gordon, Matthew and Peter Ladefoged
2001 Phonation types: a cross-linguistic overview. Journal of Phonetics
29: 383406.
Haudricourt, Andr-Georges
1954 De lorigine des tons en Vitnamien. Journal Asiatique 242: 6982.
Hombert, Jean-Marie, John J. Ohala and William G. Ewan
1979 Phonetic explanations for the development of tones. Language 55:
3758.
Hu, Fang and Xiong, Ziyu
2010 Lhasa tones. Speech Prosody 2010, 100163: 14.
Hu, Tan, Aitang Qu and Lianhe Lin
1982 Experimental studies on Lhasa Tibetan tones [in Chinese]. Yuyan
Yanjiu 2: 1838.
Huang, Bufan
1994 Conditions for tonogenesis and tone split in Tibetan dialects [in
Chinese]. Minzu Yuwen 3:19. English translation by Jackson T.-S.
Sun in Linguistics of the Tibeto-Burman Area 18: 4362, 1995.
Jiang, Di
2002 Studies on Tibetan historical sound change [in Chinese]. Beijing:
Minzu Press.
Jin, Peng (ed.)
1983 An introduction of Tibetan language [in Chinese]. Beijing: Minzu
Press.
Karlgren, Bernhard
191526 tudes sur la phonologie chinoise. Archives dtudes Orientales,
Vol. 15, Leyde et Stockholm.
Ladefoged, Peter
1971 Preliminaries to linguistic phonetics. Chicago: The University of
Chicago Press.
Li, Fang-Kuei
1977 A handbook of comparative Tai. Honolulu: University Press of
Hawaii.
Maspro, Henri
1912 tudes sur la phontique historique de la language annamite: les ini-
tiales. Bulletin de lcole Franaise dExtrme Orient 12.1: 1124.
252 Fang Hu
Matisoff, James A.
1973 Tonogenesis in Southeast Asia. In Larry M. Hyman, ed., Consonant
Types and Tone, pp. 7195. Southern California Occasional Papers
in Linguistics, No. 1. Los Angeles: USC.
Matisoff, James A.
1999 Tibeto-Burman tonology in an areal context. In Shigeki Kaji, ed.,
Proceedings of the Symposium Cross-Linguistic Studies of Tonal
Phenomena: Tonogenesis, Typology, and Related Topics, pp. 332.
Tokyo: Institute for the Study of Languages and Cultures of Asia
and Africa, Tokyo University of Foreign Studies.
Mazaudon, Martine
1977 Tibeto-Burman tonogenetics. Linguistics of the Tibeto-Burman Area
3.2: 1123.
Mei, Tsu-Lin
1970 Tones and prosody in Middle Chinese and the origin of the rising
tone. Harvard Journal of Asiatic Studies 30: 86110.
Mcke, Doris, Martine Grice, Johannes Becker and Anne Hermes
2009 Sources of variation in tonal alignment: evidence from acoustic and
kinematic data. Journal of Phonetics 37: 321338.
Mcke, Doris, Hosung Nam, Anne Hermes and Louis Goldstein
2012 Coupling of tone and constriction gestures in pitch accents, this
volume.
Nam, Hosung
2007 Syllable-level intergestural timing model: Split-gesture dynamics
focusing on positional asymmetry and moraic structure. In J. Cole
& J. I. Hualde (eds.), Laboratory Phonology 9, pp. 483506. Berlin,
Nam, Hosung, Louis Goldstein and Elliot Saltzman
2010 Self-organization of syllable structure: a coupled oscillator model.
In F. Pellegrino, E. Marisco & I. Chitoran (Eds.), Approaches to
phonological complexity. Berlin, New York: Mouton de Gruyter.
Nam, Hosung and Elliot Saltzman
2003 A competitive, coupled oscillator model of syllable structure. In
Proceedings of the 15th ICPhS, pp. 22532256, Barcelona, Spain.
Pulleyblank, Edwin G.
1962 The consonantal system of Old Chinese, Part II. Asia Major 9: 206
265.
Qu, Aitang
1981 Tibetan tone and its historical development [in Chinese]. Yuyan
Yanjiu 1: 177194.
Sun, Jackson T.-S.
1997 The typology of tone in Tibetan. Chinese Languages and Linguistics
IV: Typological Studies of Languages in China (Symposium Series
of the Institute of History and Philology, Academia Sinica, Number
2), 485521. Taipei: Academia Sinica.
Sun, Jackson T.-S.

2003 Variegated tonal developments in Tibetan. In David Bradley, Randy
LaPolla, Boyd Michailovsky and Graham Thurgood (eds.) Lan-
guage variation: papers on variation and change in the Sinosphere
and in the Indosphere in honour of James A. Matisoff, pp. 3551.
Sprigg, Richard K.
1954 Verbal phrases in Lhasa Tibetan. Bulletin of the School of Oriental
and African Studies 16.
Sprigg, Richard K.
1990 Tone in Tamang and Tibetan, and the advantage of keeping register-
based tone systems separate from contour-based systems. Linguistics
of the Tibeto-Burman Area 13.1: 3356.
Tan, Kerang and Jiangping Kong
1991 Vowel length, rime length and their relations with tone in Lhasa
Tibetan [in Chinese]. Minzu Yuwen 2: 1221.
Wang, William S.-Y.
1967 Phonological features of tone. International Journal of American
Xu, Yi
1998 Consistency of tone-syllable alignment across different syllable
structures and speaking rates. Phonetica 55: 179203.
Xu, Yi
1999 Effects of tone and focus on the formation and alignment of f0 con-
tours. Journal of Phonetics 27: 55105.
Xu, Yi
2005 Speech melody as articulatorily implemented communicative func-
tions. Speech Communication 46: 220251.
Zee, Eric and Ian Maddieson
1979 Tones and tone sandhi in Shanghai: phonetic evidence and phono-
logical analysis. UCLA Working Papers in Phonetics, 45, 93129.
Part III. Acquisition
Probabilistic phonotactics in lexical acquisition:
The role of syllable complexity
Abstract
Long-term memory representations that facilitate short-term memory (STM) recall
have been found to also facilitate lexical acquisition (e.g. Gathercole 2006). Such facil-
itation comes for example from probabilistic phonotactics. It is controversial, whether
probabilistic phonotactic knowledge is informed by abstractions from lexical entries,
or also by sub-lexical representations. When disentangling the two, previous studies
found lexical effects but had difculties demonstrating sub-lexical effects on STM
recall (e.g. Roodenrys & Hinton 2002). It is, however, paradox to need a lexicon for
lexical acquisition. Strikingly, previous studies had only used CVC nonwords as
stimuli. We hypothesize that sub-lexically represented probabilistic phonotactics are
informed by abstract knowledge about phonological structure. Consonant clusters
in syllable margins are structurally more restricted than CV or VC strings. Hence, sub-
lexical effects should increase with syllable complexity. This was tested with Dutch
adults in an STM recognition task. As expected, recognition was faster for nonwords
of high than of low phonotactic probability. The effect was present when complex
syllables were used, but not, when syllables were simple. A second experiment that
controlled for stimulus duration, as longer stimuli had provoked longer recall latencies,
replicated the result. The study opens up the possibility that sub-lexical knowledge
bootstraps lexical acquisition.
1. Introduction
Numerous psycholinguistic studies provide experimental evidence that human

knowledge of phonotactics is gradient. For example, in well-formedness judg-
ment tasks, participants rate nonwords to be more word-like when they are
composed of frequently occurring phoneme sequences (e.g., Bailey & Hahn,
2001; Coleman & Pierrehumbert, 1997; Scholes, 1966). Furthermore, phono-
tactic distributions affect human performance in online processing tasks such
as lexical decision, nonword repetition, and same-different tasks, such that
nonwords composed of high frequency biphones are processed faster and
more accurately than nonwords composed of low frequency biphones (e.g.,
Frisch, Large, & Pisoni, 2000; Pitt & McQueen, 1998; Vitevitch & Luce,
258 Natalie Boll-Avetisyan
1998, 1999). Phonologists assume that phonotactic well-formedness is assessed

by constraints that are part of a phonological grammar. In the traditional gen-
erative view, the phonological grammar is assumed to be strictly categorical,
which means that phonotactic constraints determine whether a nonword is licit
or illicit (e.g., Chomsky & Halle, 1968). Recent models of phonological gram-
mar include gradient constraints on phonotactic well-formedness or gradience
as a result of stochastic constraint interaction (e.g., Boersma & Hayes, 2001;
Coetzee & Pater, 2008; Frisch, Pierrehumbert, & Broe, 2004; Hayes, 2000;
Pierrehumbert, 2003), which can also account for probabilistic phonotactic
effects on the lexicon.
The standard view among phonologists is that phonotactic constraints, which
dene the possible set of words in a language, emerge from generalizations
over items stored in the lexicon (Hayes & Wilson, 2008; Pierrehumbert,
2003). Such unidirectional models of the relation between the phonotactic
grammar and the lexicon, however, predict that the language-learning child
needs to establish a lexicon of considerable size before being able to deduce
phonotactics. This is implausible, given that nine-month-old infants already
show evidence of language-specic phonotactic knowledge, for example, by
preferring frequent phonotactic patterns in their language to infrequent ones
(Jusczyk, Luce, & Charles-Luce, 1994), and by using language-specic pho-
notactics for speech segmentation (Mattys & Jusczyk, 2001; Mattys, Jusczyk,
Luce, & Morgan, 1999), at an age at which the lexicon is not likely to have
been fully developed (e.g., Fenson et al., 1994). Such experimental studies
are consistent with the hypothesis that the lexicon is not the only source of
phonotactic knowledge, at least not in early acquisition. Still, they fail to pro-
vide conclusive evidence, as it is unknown when lexical entries initially are
created and what the properties of entries in a proto-lexicon might be.
Over the past decade, studies have been carried out that demonstrate the
effects of probabilistic phonotactics on lexical acquisition. In light of these
studies it becomes more likely that the relation between phonotactics and the
lexicon is bidirectional. For example, when children are taught unfamiliar
word/meaning pairs, they learn CVC nonwords with high phonotactic proba-
bility biphones more rapidly than CVC nonwords with low phonotactic prob-
ability biphones (e.g., Storkel, 2001; Storkel & Rogers, 2000). Similar evidence
for the role of phonotactics in lexical acquisition comes from nonword recall
experiments with children and adult second language learners. Short-term
memory (STM) recall performance in nonword recall tasks has often been
found to be correlated with lexicon size in adults and children in their native
language (L1) (for an overview see Gathercole, 2006) and foreign language
(L2) (e.g., Service & Kohonen, 1995), and has, hence, been suggested to
Probabilistic phonotactics in lexical acquisition 259
reect the role of STM in lexical acquisition. In fact, nonword recall can be
seen as the initial step in storing new words in the mental lexicon. The easier
it is to hold an item in STM, the easier it is to store it in long-term memory
(LTM). Interestingly, STM recall is affected by LTM representations. Hence,
it has been suggested that LTM knowledge is used for reconstructing degraded
memory traces during sub-vocal rehearsal in STM a process referred to as
redintegration (Schweickert, 1993).
As to the role of phonotactics in STM recall, a pioneer study by Gathercole
and colleagues (Gathercole, Frankish, Pickering, & Peaker, 1999) showed that
seven to eight-year-old children were better at recalling CVC nonwords with
high rather than low phonotactic probability in serial nonword repetition tasks.
Moreover, children who did particularly well at relying on cues from phono-
tactic probability in nonword repetition were shown to have a larger lexicon
size than children who did less well in relying on the phonotactic cues. Similar
results have been found in a study with adult L2 learners (Majerus, Poncelet,
Van der Linden, & Weekes, 2008).
Although the studies reviewed above may seem to offer evidence for direct
effects of probabilistic phonotactics on lexical acquisition, it is important to
guard against premature conclusions, given that there are two levels of
processing from which probabilistic phonotactic information can be derived.
One is a sub-lexical level, at which knowledge of sub-word units, such as
phonemes and biphones and their probability of occurrence (e.g., Vitevitch &
Luce, 1999) is represented. The other is the lexical level, at which the phono-
logical forms of words and morphemes are represented. Probabilistic phono-
tactics can be deduced from lexical items by comparing phonologically similar
items and their frequencies.
As mentioned before, it would be helpful for a language-learning child or
an L2 learner, if (s)he could draw on phonotactic knowledge to facilitate
word learning before the onset of lexical acquisition. To open up this possibility,
it is necessary to distinguish between lexical and sub-lexical knowledge, as
only the latter can possibly be acquired independently of the lexicon. The
study by Gathercole et al. (1999) was later criticized on the grounds that
when manipulating sub-lexical factors (such as biphone frequencies), lexical
factors (such as lexical neighborhood density) had not been controlled for
(Roodenrys & Hinton, 2002). Lexical and sub-lexical probabilities are highly
correlated: words composed of high frequency biphones tend to have many
lexical neighbors (e.g., Landauer & Streeter, 1973). Experimental evidence
suggests that both lexical and sub-lexical factors function as independent
predictors of well-formedness judgments on nonwords, even though they are
highly correlated (e.g., Bailey & Hahn, 2001). Furthermore, they are known
to have different effects on online word recognition. In a series of experiments

using different online-processing tasks, Vitevitch and Luce (1999) found that
CVC nonwords with a high phonotactic probability and high lexical neighbor-
hood density were processed faster than CVC nonwords with low phonotactic
probability and low lexical neighborhood density. For real words, the effect
was reversed. The authors interpreted their ndings as lexical frequency effects
being dominant in word processing, with large competing cohorts slowing down
processing. The processing of nonwords, lacking a lexical entry, might predom-
inantly happen at the sub-lexical level and receive facilitation from high phono-
tactic probability, as frequent biphones can be decoded more quickly.
Studies following up on Gathercole et al. (1999) that have controlled for
lexical factors do not give a coherent picture of whether only lexical factors
or also sub-lexical factors play a role in STM nonword recall. Roodenrys and
Hinton (2002) ran two experiments in order to establish the extent to which
sub-lexical knowledge (probabilistic phonotactics) or lexical knowledge (lexical
neighborhood density) affects STM nonword recall. In their experiments with
adults, biphone frequencies did not affect recall accuracy when lexical neigh-
borhood density was controlled for, while lexical neighborhood density was a
signicant factor when biphone frequency was controlled for. They conclude
that recall of CVC nonwords is primarily inuenced by lexical neighborhood
density, more than by frequency of the CV and VC biphones of which they are
composed. However, using a more stringent control for lexical neighborhood,
Thorn and Frankish (2005) found a facilitatory effect of high biphone fre-
quency on recall accuracy.
When Storkel and colleagues separately investigated the inuence of lexical
and sub-lexical factors on word learning in adults (Storkel, Armbrster, &
Hogan, 2006), they found facilitatory effects of high neighborhood density on
word learning, but contrary to their previous studies an inhibitory effect of
high phonotactic probability. The effect of phonotactic probability, however,
only occurred among items that were partially remembered, i.e., among items
in which only two out of three phonemes were correctly reproduced. The
authors conclude that novel words with low phonotactic probability might
stand out in speech and therefore be relevant in the initial stages of word learn-
ing. Later, during the integration of novel forms into the lexicon, lexical neigh-
borhood information might be of higher importance. In sum, these contradic-
tory ndings concerning a sub-lexical involvement in lexical acquisition are
far from conclusive.
It is remarkable that the aforementioned studies have only addressed prob-
abilistic phonotactic knowledge, which merely refers to the frequency or prob-
ability of co-occurrences of phonemes in a language. A number of researchers
(e.g., Dupoux, Kakehi, Hirose, Pallier, & Mehler, 1999; Pierrehumbert, 2003)
have equated the sub-lexical level, as it is phonological by denition, with
a phonological level that represents a phonological grammar. Under current
conceptualizations, phonological grammar can be interpreted as a knowledge
system that contains, among others, markedness constraints referring to abstrac-
tions of structure, such as syllable structure (Prince & Smolensky, 1993/2004).
Markedness constraints ban marked structures, such as complex syllable onsets
and codas. The notion of a phonological grammar is supported by data from
typology, acquisition and processing. Typologically, languages that tolerate
complex syllable margins, such as Dutch and English, also tolerate simple
syllable margins; in contrast, however, there are no languages that tolerate
complex syllable margins, but disallow simple syllable margins. This implica-
tion indicates that complex constituents are more marked than simple constit-
uents. Complex constituents are restricted by the constraints *Complex-Onset,
penalizing consonant clusters in syllable onsets, and *Complex-Coda, penal-
izing consonant clusters in syllable codas. CVC syllables, however, are rela-
tively unmarked, being constructed of simple syllable constituents only.
It has been noted that unmarked structures are acquired earlier than marked
structures (e.g., Jakobson, 1969; Smolensky, 1996). This has also been found
to be the case with syllable structure. Dutch children start producing CV
and CVC syllables before CVCC and CCVC syllables. CCVCC seems to be
even harder to acquire, possibly due to a conjoined constraint *Complex-
Onset&*Complex-Coda (Levelt & Vijver, 1998/2004)1. This means that
words with a CCVCC structure are intrinsically more marked than words
with a CVC structure, even if complex constituents are phonotactically legal,
as in languages such as English or Dutch. When learners are faced with the
task of learning a word, they need to make reference to their phonological
grammar, which informs them about whether the form violates or satises
markedness constraints. When a word of a simple CVC structure is acquired,
markedness constraints will hardly be violated. The more complex a structure,
the more relevant markedness constraints will be for the processing system.
The reference to the phonological grammar when acquiring a word should
furthermore have an effect on the inuence of probabilistic phonotactics.
The aforementioned studies on the role of sub-lexical representations in
lexical acquisition have not only neglected the fact that sub-lexical processing
1. The order of acquisition matches the frequency of occurrence of the templates in

Dutch. Hence, it might be that frequency of occurrence dominates markedness in
acquisition, and the order of acquisition might be different in other languages.
This, however, is not likely, as markedness dominates frequency in other cases
such as the acquisition of sC-clusters in Dutch, in which unmarked structures are
acquired rst, although they are less frequent than the marked (see Fikkert 2007).
is informed by two different components; they have also used exclusively

structurally simple test items, in particular CVC syllables. CV and VC biphones
are subject to weaker phonotactic restrictions as compared to consonant clusters
in syllable margins, which, being part of single syllable constituents (onsets and
codas), are subject to stronger co-occurrence constraints (Selkirk, 1982). There-
fore, my interpretation of the previous studies is that by only using CVC items
as stimuli, these studies only had a slight chance of being able to demonstrate
sub-lexical effects on STM recall or lexical acquisition.
The low probability of consonantal biphones often coincides with the fact
that they violate more markedness constraints. For example, the Dutch cluster
/kn/ is intrinsically less well-formed than /kr/ due to the universal Sonority
Sequencing Principle, which states that consonants in clusters should differ
maximally in sonority (Broselow, 1984; Selkirk, 1982). Hence, for words with
consonant clusters, markedness constraints and biphone probabilities typically
converge. In the case of learning a CVC word, however, the inuence of
structural constraints is at a minimum, since markedness constraints express-
ing co-occurrence restrictions between C and V, or V and C are not typologi-
cally common. Hence, it should not make much of a difference whether a
CVC word is composed of high or low frequency biphones. In the case of
structurally more complex items, the inuence of structural constraints increases.
Structurally complex and low biphone frequency nonwords should be the most
difcult to remember, as they violate the most markedness constraints.
There are some studies that provide evidence for the role of syllable struc-
ture in STM recall. Treiman and Danis (1988), for example, let participants
repeat a series of nonwords with CVC, CCV, or VCC structures. A count of
the erroneous repetitions revealed fewer errors in biphones that are part of
a syllable constituent than in biphones that straddle a constituent boundary.
Also, 4-year-olds make fewer errors when repeating nonwords with singleton
consonants than with consonant clusters (Gathercole & Baddeley, 1989). These
studies, however, did not control for probabilistic phonotactics, so it might be
the case that fewer errors occurred in biphones of higher frequency a factor
that was unknown to inuence recall at the time the studies were carried out.
The hypothesis of the current paper is that sub-lexical representations are
two-fold, involving representations of non-grammatical low-level probabil-
istic phonotactics on the one hand and a phonological grammar with marked-
ness constraints on the other. We hypothesize that effects of phonotactic prob-
ability are modulated by markedness constraints.
Lexical acquisition, according to this hypothesis, should be inuenced by
an interaction of the two sub-lexical components, such that facilitatory effects
from high phonotactic probability on word learning should be aggravated by
structural complexity. More precisely, the difference in learning difculty
between words containing biphones with low versus high phonotactic proba-
bility should be larger for words containing structurally complex syllable
constituents (such as complex onsets or complex codas) than for nonwords
containing structurally simple syllable constituents (such as singleton onsets
and codas).
The hypothesis was tested in two STM experiments with adult native
speakers of Dutch. Dutch allows for complex syllable onsets and syllable
codas. Yet the markedness interpretation of structural complexity predicts that
complex onsets and complex codas should be less well-formed than simple
onsets and codas. I used a probed STM recognition task (Sternberg, 1966),
which has the advantage that no articulatory motor programs are co-activated.
This is a change from previous studies on probabilistic phonotactics in lexical
acquisition, which mostly used production tasks (e.g., Gathercole, et al., 1999;
Roodenrys & Hinton, 2002; Storkel, et al., 2006; Thorn & Frankish, 2005). A
study (Storkel, 2001) that used perception-based tasks revealed facilitatory
effects of high phonotactic probability to the same extent as a production-
oriented task.
The prediction was that with nonword stimuli doubly manipulated for both
phonotactic probability and syllabic complexity, phonotactic probability would
affect recognition performance such that high biphone frequency would facili-
tate nonword recognition, but only or more severely so, in the case of complex
syllables.
2. Experiment 1
2.1. Method
2.1.1. Participants
Participants were 30 native speakers of Dutch without self-reported hearing
disorders. All were drawn from the Utrecht Institute of Linguistics participant
pool and compensated for participation.
2.1.2. Materials
All stimuli were nonwords that are phonotactically legal in Dutch. That is, all
phonemes are part of the Dutch phoneme inventory and all biphones are licit
sequences in the syllable positions in which they occur. The stimuli were
manipulated for two factors. One was syllable structure type, a factor of four
levels (CVC, CVCC, CCVC, and CCVCC). The second was biphone fre-
quency, a factor of two levels (high versus low biphone frequency). Biphone
frequency counts were based on biphone occurrences in a lexicon of 8,305

monomorphemic word types extracted from the CELEX lexical database
(Baayen, Piepenbrock, & Gulikers, 1995). The low biphone frequency stimuli
consisted of biphones taken from the least frequent 33% of all biphones, and
the high biphone frequency stimuli were composed of biphones taken from the
most frequent 33% of all biphones. Mean biphone frequencies are given in
Table 1.
The stimuli were controlled for lexical neighborhood density at all levels of
both factors in order to rule out lexical effects. A lexical neighbor was dened
as any word differing from the nonword by the addition, deletion, or substitu-
tion of a single phoneme (e.g., Vitevitch & Luce, 1999). Lexical neighborhood
density was calculated by summing the logged token frequencies of lexical
neighbors for each target item.
Table 1. Averages of token biphone frequencies (#CC = consonant cluster in onset,

CC# = consonant cluster in coda position) of the stimuli used in Experiments
1 and 2. Numbers in parentheses represent standard deviations.
Biphone frequency High Low
#CC 5252.72 (4986.55), N = 46 580.17 (309.65), N = 46
CV 6190.61 (3960.58), N = 92 632.47 (444.91), N = 92
VC 9132.51 (6510.22), N = 92 624.74 (575.24), N = 92
CC# 8711.96 (6777.85), N = 46 1096.13 (560.07), N = 46
The target items were created such that they would only minimally differ
from each other, such as the low biphone probability nonwords /lum/, /lump/,
and /xlump/, or the high biphone probability nonwords /vo:k/ and /vo:kt/.
In this way, interference of singleton frequency effects, which are known
to inuence phonotactic well-formedness (e.g., Bailey & Hahn, 2001), was
minimized.
The CVC, CCVC, CVCC and CCVCC ller items used in this experiment
were randomly selected from a list of Dutch nonwords. Each ller item occurred
only once throughout the experiment. The stimuli were spoken in a sound-proof
booth by a Dutch native speaker, who was nave of the purpose of the study.
Two Dutch native speakers conrmed that the stimuli sounded natural. A list
of all target items is given in Appendix A.
2.1.3. Procedure
Participants were tested in a probed recognition task (Sternberg, 1966), in
which they were presented a series of four nonwords followed by a probe.
The task was to decide whether the probe was in the series or not. Each series
contained one target and three ller items. The series were designed such that
every syllable type occurred once in each trial. Every series thus contained the
same number of segments. An example is given in Table 2.
Table 2. Examples of both a target and a ller series used in Experiment 1 and 2a.
Series Item 1 Item 2 Item 3 Item 4 Probe Appeared before?

Target /do:rk/ /tx/ /zwa:lm/ /gp/ /tx/ yes
Filler /brint/ /tp/ /bi:rf/ /frp/ /lyn/ no
The experiment consisted of 184 target series. The design had two factors
(biphone frequency, syllable structure) with 2 4 levels (high/low; CVC,
CVCC, CCVC, CCVCC); accordingly, the 184 targets divide into 23 targets
of each type. In addition, 184 ller-series, in which the probe did not match
any of the prior four llers, were included. All series and all items within the
series were randomized for every participant in order to avoid item- or series-
specic order effects.
Participants were tested individually in a soundproof booth facing a
computer screen. The stimuli were played by computer over headphones at a
comfortable volume. The stimuli were presented with a relatively long inter-
stimulus-interval (ISI) of 700 ms, and the probe was presented 1400 ms after
the last item of the stimulus series. This was done to add a high memory load
to the task to invoke reference to more abstract phonological representations
(e.g., Werker & Logan, 1985).
Yes/No-decisions were made on a button-box. After the end of each series,
an icon occurred on the screen indicating the beginning of a new trial. The
dependent measure was reaction time (RT). When no decision was made after
3000 ms, the trial was stopped and counted as an error. The experiment took
90 minutes. Three breaks were included. There were three practice trials
before the experiment started.
2.1.4. Data selection

All 30 participants were included in the analysis. Fillers, errors, no-responses,
responses slower than 2500 ms, and responses faster than 200 ms were ex-
cluded from the analysis. The RT was measured from the onset of the target
probe.
2.2. Results
2.2.1. Reaction times
A linear mixed regression model with RT as dependent variable, Participants
and Targets as random factors, and Biphone frequency (high/low), Syllable
structure (CVC, CCVC, CVCC, CCVCC) and Biphone frequency*Syllable
structure as xed factors revealed signicant differences between nonwords
of different syllable structures as well as interaction effects between syllable
structure and biphone frequency, but no signicant differences between high
and low biphone frequency nonwords (see Appendix B).
Table 3. Estimated reaction time averages and the differences () between reaction
time means for high versus low biphone frequency in Experiment 1 in ms
measured from the target onset.
Syllable Structure High How Total M
CVC 1117.82 1103.75 14.70 1110.79
CVCC 1114.42 1172.82 58.40 1143.62
CCVC 1107.68 1182.36 74.68 1145.02
CCVCC 1133.08 1221.40 88.32 1177.24
Total M 1118.25 1170.08 51.68 1144.17
An analysis of the estimates of the xed effects with low biphone frequency-
CVC as a reference point revealed a signicant interaction of biphone fre-
quency and syllable structures between the simple syllable structure CVC and
the more complex syllable structures CCVC and CCVCC (see Appendix C).
That is, compared to high and low biphone frequency CVC nonwords, partic-
ipants were signicantly slower at recalling CCVC and CCVCC nonwords of
low biphone frequency than CCVC and CCVCC nonwords of high biphone
frequency (see Figure 1). The averages of the estimated RT means are given
in Table 3.
2.2.2. Exploration
The results in Experiment 1 may, however, be due to another factor, which we
had not controlled for: high biphone frequency words are generally spoken
faster than low biphone frequency words (Kuperman, Ernestus, & Baayen,
2008)2. In order to check whether such a difference might account for our
2. I am indebted to Mirjam Ernestus for pointing this out.

Figure 1. Averages of estimated means and 95% condence intervals of RT in ms for

each syllable structure (CVC, CVCC, CCVC and CCVCC) for each high
(H) and low (L) biphone frequency nonwords.
results, we measured the duration of our target items. We did not nd that our
speaker had pronounced all high biphone frequency target items in a shorter
time than the low biphone frequency target items. However, we observed that
the spoken durations of each item type overall matched the respective reaction
times. So, both durations and reaction times were longer for low rather than
for high biphone frequency CCVC and CCVCC nonwords. For CVC non-
words, this effect was reversed (compare Figure 1 and Figure 2).
This is problematic, as speech-rate is known to affect recall latencies: the
longer the duration of an item, the more difcult it is to recall (e.g., Baddeley,
Thomson, & Buchanan, 1975; Cowan, Wood, Nugent, & Treisman, 1997).
Hence, the faster RTs on high biphone frequency items may be due to the
fact that they were shorter in duration. We added speech-rate as a co-variate
to the analysis, and found the effects to remain signicant.
Figure 2. Duration means and SDs in ms of the high versus low biphone frequency
target items for each syllable structure in Experiment 1.
2.3. Discussion
It was predicted that, when holding CVC items in the phonological loop, there
should be little support from sub-lexical LTM representations, as CV and VC
biphones are hardly restricted by structural constraints. CC biphones, on the
contrary, are much more restricted. Hence, reference to sub-lexical LTM repre-
sentations with representations of specic biphones making reference to the
phonological grammar should be important in preventing complex CVCC,
CCVC, or CCVCC items from decaying in the phonological loop. Hence, it
was predicted that effects of biphone frequency on STM recognition perfor-
mance would increase with increasing syllable complexity. This effect should
occur while lexical neighborhood frequency is controlled for, to make sure
that effects relate to sub-lexical rather than lexical knowledge.
As displayed in Figure 1, the result is as predicted: The differences in
recognition performance between high and low biphone frequency in inter-
action with syllable structure increased from simple to complex structures.
Biphone frequency only affected STM recognition of complex syllables, but

not of simple CVC items. It is in line with our prediction that effects of sub-
lexical knowledge do not necessarily surface in the recognition performance of
CVC items. Prior studies on the effects of sub-lexical knowledge on lexical
acquisition provided they had controlled for lexical factors also had dif-
culties nding effects of phonotactic probability on memorizing CVC items.
Roodenrys and Hinton (2002) did not nd effects of phonotactic probability
on recall. Storkel and colleagues (2006) only revealed effects of sub-lexical
knowledge on partially, but not correctly memorized nonwords. Still, Thorn
and Pickering (2005) found CVC recall to be facilitated by knowledge of
phonotactic probability independent of lexical knowledge. Possibly, the lack
of a biphone frequency effect on CVC recognition in the current study is due
to the long ISIs used in the task. Roodenrys & Hinton (2002) as well as Thorn
& Pickering (2005) played their lists at a rate of one item per second. They do
not inform us about the mean duration of their items. However, if their dura-
tion was similar to ours, then their ISIs were signicantly shorter. The longer
the ISIs, the more processing of verbal material is inuenced by more abstract
phonological knowledge (e.g., Werker & Logan, 1985). Thus, when decreas-
ing memory load by using shorter ISIs, low-level probabilistic knowledge
might be more involved to support the maintenance of verbal information in
the phonological loop.
As mentioned above, the result of our study might be confounded by the
speech-rate of the stimuli used in the experiment. Even if the interaction of
biphone frequency and syllable structure effect remains signicant when
speech-rate is added as a co-variate, we need to be careful about drawing con-
clusions from the results of Experiment 1, since co-variate analysis do not take
into account that speech-rate might have different effects at different levels of
independent factors.
In order to conrm that the interaction of phonotactic probability and syllable
structure is not an artifact of speech-rate, a second experiment controlling for
this possible confound was carried out. In Experiment 2, two conditions were
tested. The rst condition, which will be referred to as Experiment 2a, repeats
Experiment 1 while controlling for speech-rate. A second condition, referred
to as Experiment 2b, merely contrasted the two extremes CVC and CCVCC.
This was done since in order to conrm the central hypothesis of this paper, it
sufces to nd an interaction between CVC and CCVCC. Considering that
Experiment 1 revealed the largest difference in high and low biphone fre-
quency between CVC and CCVCC nonwords, it was predicted that a differ-
ence between high and low phonotactic probability was most likely to occur
between CVC and CCVCC nonwords in Experiment 2.
3. Experiment 2
3.1. Method
Experiment 2 aimed at replicating the results of Experiment 1 while control-
ling for speech-rate. The experiment was carried out in two conditions: Con-
dition 1 repeated Experiment 1 using exactly the same stimuli, but controlled
for speech-rate. Condition 2 only used the CVC and CCVCC nonwords of
Condition 1. CVCC and CCVC were excluded because the interaction in
Experiment 1 did not occur between CVCC and CVC and was therefore also
not expected to occur here. Furthermore, the interaction was strongest between
CVC and CCVCC nonwords and our hypothesis can also be tested using two
syllable structure types only.
3.1.1. Participants
Sixty native speakers of Dutch without self-reported hearing disorders all
drawn from the Utrecht Institute of Linguistics participant pool, none of
whom had participated in Experiment 1 participated in the experiment.
They were compensated for their participation.
3.1.2. Materials
The stimuli were identical to those used in Experiment 1, with two differences:
First, the target items were controlled for duration such that for each class of
syllable structure the duration of the stimuli did not differ between high and low
frequency biphone nonwords (see Figure 3). This was realized by manually
adjusting the vowel durations, which produces more natural results than
adjusting the duration of the nonwords as a whole, as durational variation in
natural speech usually affects vowels more than consonants (e.g., Greenberg,
Carvey, Hitchcock, & Chang, 2003). Manipulating vowel length should not
have caused perceptual confusion since long and short vowels contrast in
terms of quality (F1, F2), making them distinguishable. Finally, it was of utmost
importance to maintain the naturalness of the onset and coda constituents, which
are the focus of this study. Using the software Praat (Boersma & Weenink,
2007), manipulations were carried out on both sides: stimuli with long durations
were shortened, and stimuli with short durations were lengthened. To ensure
that they would not alter the vowel quality, manipulations were carried out
only in the middle of the vowel. For shortenings, a portion in the middle was
cut out, and for lengthenings, the waves of the middle were copy-pasted. Two
native speakers of Dutch conrmed that the stimuli sounded natural.
Figure 3. Duration means and SDs in ms of the high versus low biphone frequency
target items for each syllable structure in Experiment 2 after controls.
Second, as a probe, we used another recording of the same type by the

same speaker to avoid a situation in which participants decisions are poten-
tially inuenced by additional cues from complete acoustic overlap of target
and probe3.
3.1.3. Procedure
There were two conditions for this experiment. Thirty participants were
assigned to Experiment 2a, and thirty participants were assigned to Experi-
ment 2b.
3.1.3.1. Procedure for Experiment 2a

In Experiment 2a, the ISI was 500 ms, and the probe occurred after an interval
of 1000 ms of silence following the series. By doing this, the total duration of
the experiment was decreased from 90 to 75 minutes. The shortenings were
3. Thanks to Silke Hamann for this suggestion.

done in order to increase the chance of revealing effects of low-level phono-

tactic probability knowledge across all syllable structures. For the rest, the
procedure in Experiment 2a was identical to that in Experiment 1.
3.1.3.2. Procedure for Experiment 2b
The procedure was identical to that in Experiment 2a, except for the test series
containing CVC and CCVCC. This holds both for the target stimuli and the
llers (see Table 4). Thus, each series had the same number of segments as in
Experiment 2a. There were 92 target series and 92 ller series. This is only
half of the target series compared to Experiment 2, from which the stimuli
were borrowed, since half of the items (CVCC and CCVC) were excluded.
The experiment took 35 minutes.
Table 4. Examples of both a target and a ller series in Experiment 2b.

Series Item 1 Item 2 Item 3 Item 4 Probe Appeared before?
Target /do:rk/ /tx/ /zwa:lm/ /gp/ /tx/ yes
Filler /brint/ /tp/ /xle:ps/ /za:m/ /lyn/ no
3.1.4. Data selection

The data of Experiment 2a and 2b were pooled4. All 60 participants were
included into the analysis. Fillers, errors, no-responses, and responses that
exceeded 2500 ms or fell below 200 ms were excluded. The RT was measured
from the onset of the target probe.
3.2. Results
A linear mixed regression model was calculated. The dependent variable was
RT. Participants and targets were random factors. The xed factors were
Experimental condition (2a/2b), Biphone frequency (high/low), Syllable struc-
ture (CVC, CCVC, CVCC, CCVCC), and interactions of Biphone frequency *
Syllable structure, Biphone frequency * Experimental condition, Experimental
condition * Syllable structure, and Biphone frequency * Syllable structure *
Experimental condition.
A linear mixed model revealed a signicant main effect of condition (F (1,
5.798) = 5.884, p < 0.05) with items all over being recognized 129.78 ms
faster in Experiment 2b than in 2a (SE = 53.69). This difference can be
accounted for by a greater decay of attention in Experiment 2a, since the
4. As Experiment 2a and 2b test the same predictions, the data were pooled to
increase power and to minimize the number of models tested. Separate analyses
of Experiment 2a and 2b did not reveal all predicted effects.
Figure 4. Averages of the estimated means and 95% condence intervals of the
reaction time in ms for CVC and CCVCC syllable structure nonwords for
each high (H) and low (L) biphone frequency.
experiment session in Experiment 2b only took 35 minutes, while Experiment

2a took 75 minutes. Crucially, there were no interactions of Experimental
condition * Biphone frequency, Experimental condition * Syllable structure, or
a three way-interaction of Experimental condition * Biphone frequency *
Syllable structure. This means that the two experimental conditions did not
lead to differences in recognizing the nonwords in terms of their intrinsic prop-
erties. This allows for a pooled analysis of the data obtained in both conditions
using the xed factors biphone frequency (high/low), syllable structure (CVC,
CCVC, CVCC, CCVCC), and the interaction of Biphone frequency * Syllable
structure.
A linear mixed model of the xed effects revealed a signicant main effect
of biphone frequency (see Appendix D), with high biphone frequency non-
words being recognized faster than low biphone frequency nonwords (see
Table 5). There was also a signicant main effect of syllable structure (see
Table 5. Estimated reaction time averages in Experiment 2a and 2b in ms for high

(High) and low (Low) biphone frequency measured from the target onset,
and the differences () between them.
Syllable Structure High Low Total M
CVC 1000.43 1038.53 38.10 1019.48
CVCC 1059.23 1137.62 78.39 1098.43
CCVC 1074.00 1118.14 44.14 1096.07
CCVCC 1027.93 1114.20 86.27 1071.07
Total M 1040.40 1102.12 61.73 1071.26
Appendix D) with CCVCC nonwords being recognized more slowly than CVC
nonwords (see Table 5). Finally, there was a signicant interaction between
biphone frequency and syllable structure (see Appendix D). More precisely,
the difference in RTs for high versus low biphone frequency nonwords was
signicantly larger among complex CCVCC nonwords than among simple
CVC items (see Figure 4).
3.3. Discussion
As predicted, participants performed signicantly better in recognizing high
rather than low biphone frequency nonwords. The effect of biphone frequency
interacts with syllable structure. The difference in RTs between high and low
biphone frequency is larger among complex CCVCC nonwords than among
simple CVC nonwords (see Figure 4), indicating that the effect of probabilistic
phonotactics increases with increasing syllable complexity. As opposed to
Experiment 1, here, biphone frequency affected recognition latencies of CVC
nonwords. On the one hand, this may be caused by the fact that items were
controlled for speech rate. On the other hand, in Experiment 2 the ISIs were
shorter than in Experiment 1, which may have elicited more low-level process-
ing than the longer ISIs in Experiment 1.
4. General discussion
The results of the two experiments in this study indicate that, as hypothesized,
sub-lexically represented knowledge affects phonological memory. Crucially,
the sub-lexical representations that are used for redintegration are twofold
with low-level probabilistic phonotactics on the one hand, and structural con-
straints as part of a phonological grammar on the other. The interaction of
these two components, i.e., growing effects of phonotactic probability with
increasing structural complexity, indicates that sub-lexical LTM knowledge is
particularly important when rehearsing phonologically complex word forms.
Less sub-lexical LTM knowledge is at play when simple CVC nonwords are
rehearsed in the phonological loop. These results suggest that when processing
hardly restricted CV and VC biphones, listeners make reference to low-level
phonotactic probability knowledge, which, however, does not necessarily need
feedback from a phonological grammar, as it is the case when structurally
more restricted CC biphones are processed. With respect to Phonological
Theory, this study supports the view that the effects of phonological grammar
are not only categorical. In our experiments, all nonwords were made up of
legal structures. They only differed in terms of the probability of biphones.
Hence, the binary grammar distinction between legal and illegal cannot be
the ultimate account. Furthermore, knowledge of phonological grammar seems
to modulate the processing of categorically legal forms depending on their
probability (e.g., Albright, 2009; Boersma & Hayes, 2001; Andries Coetzee,
2008; Hayes, 2000; Pierrehumbert, 2003).
Future studies may want to investigate whether the additive effect of struc-
tural complexity in low biphone frequency items necessarily relates to two
representational components with probabilities on the one hand and marked-
ness constraints on the other, or whether all effects may be accounted for by
either a grammar or by probabilities.
The result of this study has indirect implications for theories of lexical
acquisition. Factors that inuence performance in STM nonword recall tasks
have been suggested to similarly constrain lexical acquisition. Among these
factors is, for example, the mechanism to draw onto LTM representations
such as sub-lexical knowledge. LTM knowledge aids when holding a novel
word form in short-term memory. Similarly, it helps to keep a novel word in
the LTM storage when it has to be remembered for a long time, i.e., when it
has to be acquired. Such conclusions are supported by the fact that perfor-
mance in STM tasks has often been found to be correlated with lexicon size
and lexical development over time (see Gathercole, 2006 for an overview).
The nding that both phonotactic probability and structural knowledge
affect recognition memory thus indicates that each of these two sub-lexical
components may be involved in facilitating lexical acquisition. As lexical
neighborhood density was controlled for, the result must be attributed to
effects of a sub-lexical rather than the lexical level. Thus, the results are con-
sistent with the hypothesis that the dependence between phonotactics and
lexical acquisition is not only unidirectional, with the lexicon informing the
phonological grammar, as is assumed by most phonologists (e.g., Hayes &
Wilson, 2008; Pierrehumbert, 2003). Instead, two interacting sub-lexical knowl-
edge components may play a role in lexical acquisition, in particular when
complex word forms are remembered. This implies a bidirectional dependence.
Considering that phonotactic knowledge is represented at a sub-lexical
level raises the question of how these sub-lexical representations are acquired.
Most studies assume that sub-lexical representations emerge as abstractions
over lexical representations. Pierrehumbert (2003), for example, assumes that
at rst the sub-lexical level only contains phonetically detailed representations.
These phonetically detailed representations are used to create lexical represen-
tations. Later, phonotactic knowledge emerges as abstractions over the lexicon.
An alternative view is that phonotactics is acquired bottom-up from speech
(e.g., Adriaans & Kager, 2010; Boll-Avetisyan et al., submitted). For a large
part, the source of lexical acquisition might be continuous speech rather than
isolated words (e.g., Christophe, Dupoux, Bertoncini, & Mehler, 1993). The
advantage of a bottom-up acquisition of phonotactics is that sub-lexical repre-
sentations could facilitate lexical acquisition from the start when the rst
words are acquired. The current study cannot provide an ultimate answer to
this question, as here effects of sub-lexical probabilistic and grammar knowl-
edge were tested on nonwords presented in isolation. It would be interesting
for future studies to test whether or not prosodic structure inuences the acqui-
sition of words from continuous speech.
We want to raise attention to the necessity of controlling for speech-rate
in studies that test effects of probabilistic phonotactics on processing. The
need for controlling for speech rate has also been discussed by Lipinsky and
Gupta (2005). They pointed out the relevance of the problem by demon-
strating that the effects of probabilistic phonotactics on processing found by
Vitevitch and Luce (1999) are hard to replicate if speech-rate is controlled
(cf. Vitevitch & Luce, 2005). It is a non-trivial task to estimate the consequences
of the confound for the hypothesis, since words composed of high frequent
biphones are intrinsically spoken faster (Kuperman, et al., 2008). This means
that two factors are correlated which are difcult to disentangle under natural
conditions. However, a certain degree of naturalness may have to be sacriced
under experimental conditions if we want to ensure that predicted effects truly
relate to the manipulated factor. Therefore, future studies should take this con-
found serious and control their test stimuli for speech-rate.
Hypothetically, the results of the current study could also be due to a mere
interaction of phonotactic probability with word length determined by the total
number of phonemes rather than the structural difference between CVC and
CCVCC. A comparison of our results with a study by Frisch et al. (2000),

however, suggests that this should not be the case. Frisch et al. contrasted non-
words of different lengths (CVC, CVCVC, CVCVCVC, and CVCVCVCVC)
of high versus low onset/rhyme probabilities in a recognition memory task, in
which participants rst judged the well-formedness of the items, and later had
to distinguish the same items from ller items. They found that phonotactic
probability affected recognition memory, while word length and an interaction
of the two factors did not. Since the CCVCC items used in our study are of the
same length as the CVCVC items used in Frisch et al.s study, but differ in
terms of complexity, we may assume that the factor modulating phonotactic
probability is syllable complexity rather than word length.
5. Conclusion
The nding that structural complexity modulates phonotactic probability effects

suggests that the sub-lexical level is informed by two components: probabilistic
phonotactics and phonological grammar. Previous studies on the role of the sub-
lexical level in nonword recognition have solely focused on one sub-lexical
component, namely representations of probabilistic phonotactics. The rele-
vance of a phonological grammar has been neglected. The results of the
current study therefore suggest that the effects of sub-lexical knowledge on
STM recognition might have arisen in previous studies (e.g., Roodenrys &
Hinton, 2002), had they used complex stimuli rather than exclusively simple
CVC stimuli. Therefore, it seems desirable that future experiments involve
stimuli of increased structural complexity to test assumptions concerning the
role of sub-lexical representations in lexical acquisition.
Acknowledgments
I would like to thank Ren Kager for supervising this project. Furthermore, I
am grateful to Huub van den Bergh for statistical advice, Theo Veenker for
programming the experiment, Frits van Brenk for assistance with Experiment
2 and Mieneke Langberg for speaking the stimuli. This work has beneted
from discussions with Frans Adriaans, Alexis Dimitriadis, Tom Lentz, Johannes
Schliesser, Keren Shatzman and the audiences at TiN-Dag 2006 and CCSC
2008 as well as from comments by two anonymous reviewers. Thanks to
Bettina Gruber for proof-reading an earlier version of the manuscript. This
research was funded by an NWO grant (277-70-001) awarded to Ren Kager.
References
Adriaans, Frans and Ren Kager

2010 Adding Generalization to Statistical Learning: The Induction of
Phonotactics from Continuous Speech. Journal of Memory and Lan-
guage 62: 311331.
Albright, Adam
2009 Feature-based generalization as a source of gradient acceptability.
Phonology 26: 941.
Baayen, R. Harald, R. Piepenbrock and L. Gulikers
1995 The CELEX Lexical Database CDROM. Philedelphia: Linguistic
Data Consortium, University of Pennsylvania.
Baddeley, Alan D., Neil Thomson and Mary Buchanan
1975 Word length and the structure of short term memory. Journal of
Verbal Learning and Verbal Behavior 14: 575589.
Bailey, Todd M., and Hahn, Ulrike
2001 Determinants of wordlikeness: Phonotactics or lexical neighbour-
hoods? Journal of Memory and Language 44: 568591.
Boersma, Paul and Bruce Hayes
2001 Empirical tests of the gradual learning algorithm. Linguistic Inquiry
321: 4586.
2007 Praat: doing phonetics by computer Version 4.6.09. Retrieved June
24, 2007, from http://www.praat.org/.
Boll-Avetisyan, Natalie, Ren Kager, Elise de Bree, Annemarie Kerkhoff, Huub van
den Bergh and Sandra den Boer
submitted Does the lexicon bootstrap phonotactics, or vice versa? Unpublished
manuscript.
Broselow, Elen
1984 An investigation of transfer in second language phonology. Inter-
national Review of Applied Linguistics in Language Teaching 22:
253269.
Chomsky, Noam and Morris Halle
1968 The sound pattern of English. New York: Harper & Row.
Christophe, Anne, Emmanuel Dupoux, Josiane Bertoncini and Jacques Mehler
1993 Do infants perceive word boundaries? An empirical study of the
bootstrapping of lexical acquisition. Journal of the Acoustical Society
of America 953: 15701580.
Coetzee, Andries
2008 Grammar is both categorical and gradient. In Steven Parker (ed.),
Phonological Argumentation, 942, London: Equinox.
Coetzee, Andries and Joe Pater
2008 Weighted constraints and gradient restrictions on place co-occurrence
in Muna and Arabic. Natural Language and Linguistic Theory 26:
289337.
Coleman, John and Janet Pierrehumbert

1997 Stochastic phonological grammars and acceptability. Paper presented
at the 3rd Meeting of the ACL Special Interest Group in Computa-
tional Phonology, Somerset, NJ.
Cowan, Nelson, Noelle L. Wood, Lara D. Nugent and Michel Treisman
1997 There are two word-length effects in verbal short-term memory:
Opposing effects of duration and complexity. Psychological Science
8: 290295.
Dupoux, Emmanuel, Kazuhiko Kakehi, Yuki Hirose, Christophe Pallier and Jacques
Mehler
1999 Epenthetic vowels in Japanese: A perceptual illusion? Journal of
Experimental Psychology: Human Perception and Performance 16:
7791.
Fenson, Larry, Philip S. Dale, J. Steven Reznick, Elizabeth Bates, Donna J. Thal and
Stephen J. Pethick
1994 Variability in Early Communicative Development. Monographs of
the Society for Research in Child Development, Serial No. 242,
595, i185.
Fikkert, Paula
2007 Acquiring phonology. In Paul d. Lacy (Ed.), Handbook of phono-
logical theory, 537554, Cambridge, MA: Cambridge University
Press.
Frisch, Stefan A., Nathan R. Large and David B. Pisoni
2000 Perception of wordlikeness: Effects of segment probability and
length on the processing of nonwords. Journal of Memory and Lan-
guage 42: 481496.
Frisch, Stefan A., Janet B. Pierrehumbert and Michael. B. Broe
2004 Similarity avoidance and the OCP. Natural Language and Linguistic
Theory 22: 179228.
Gathercole, Susan E.
2006 Nonword repetition and word learning: The nature of the relation-
ship. Applied Psycholinguistics 27: 513543.
Gathercole, Susan E., and Alan D. Baddeley
1989 Evaluation of the role of phonological STM in the development of
vocabulary in children: A longitudinal study. Journal of Memory
and Language 28: 200213.
Gathercole, Susan E., Clive R Frankish, Susan J. Pickering and Sarah Peaker
1999 Phonotactic inuences on shortterm memory. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition 25: 8495.
Greenberg, Steven, Hannah Carvey, Leah Hitchcock and Shuangyu Chang
2003 Temporal properties of spontaneous speech a syllable-centric per-
spective. Journal of Phonetics 31: 465485.
Hayes, Bruce
2000 Gradient well-formedness in Optimality Theory. In: Joost Dekkers,
Frank van der Leeuw and Jeroen van de Weijer (eds.), Optimality
Theory: Phonology, syntax, and acquisition, 88120, Oxford: Oxford

Univeristy Press.
Hayes, Bruce and Colin Wilson
2008 A maximum entropy model of phonotactics and phonotactic learn-
ing. Linguistic Inquiry 393: 379440.
Jakobson, Roman
1969 Kindersprache, Aphasie und allgemeine Lautgesetze. Frankfurt am
Main: Suhrkamp. First published Uppsala: Almqvist and Wiksell
[1941].
Jusczyk, Peter W., Paul A. Luce and Jan Charles-Luce
1994 Infants sensitivity to phonotactic patterns in the native language.
Journal of Memory and Language 335: 630645.
Kuperman, Victor, Ernestus, M., and Baayen, R. H.
2008 Frequency Distributions of Uniphones, Diphones and Triphones in
Spontaneous Speech. Journal of the Acoustical Society of America
1246: 38973908.
Landauer, T. K., and L. A. Streeter
1973 Structural differences between common and rare words: Failure of
equivalence assumptions for theories of word recognition. Journal
of Verbal Learning and Verbal Behavior 12: 119131.
Levelt, Claartje and Ruben van de Vijver
1998/2004 Syllable types in cross-linguistic and developmental grammars. In:
Ren Kager, Joe Pater and Wim Zonneveld (eds.), Fixing Priorities:
Constraints in Phonological Acquisition, 204218, Cambridge,
England: Cambridge University Press.
Lipinsky, John and Prahlad Gupta
2005 Does neighborhood density inuence repetition latency for non-
words? Separating the effects of density and duration. Journal of
Memory and Language 52: 171192.
Majerus, Steve, Martine Poncelet, Martial Van der Linden and Brendan S. Weekes
2008 Lexical learning in bilingual adults: The relative importance of
short-term memory for serial order and phonological knowledge.
Cognition 107: 395419.
Mattys, Sven L., and Peter W. Jusczyk
2001 Phonotactic cues for segmentation of uent speech by infants. Cogni-
tion 78: 91121.
Mattys, Sven L., Peter W. Jusczyk, Paul A. Luce and James L. Morgan
1999 Phonotactic and prosodic effects on word segmentation in infants.
Cognitive Psychology 38: 465494.
2003 Probabilistic phonology: Discrimination and robustness. In: Rens
Bod, Jennifer Hay and Stefanie Jannedy (eds.), Probabilistic Lin-
guistics, 177228, Cambridge MA: MIT press.
Pitt, Mark A. and James M. McQueen
1998 Is compensation for coarticulation mediated by the lexicon? Journal
of Memory and Language 39: 347370.
Prince, Alan and Paul Smolensky

1993/2004 Optimality Theory: Constraint interaction in generative grammar.
Rutgers University and University of Colorado at Boulder, 1993,
Revised version published by Blackwell, 2004.
Roodenrys, Steven and Melinda Hinton
2002 Sublexical or Lexical Effects on Serial Recall of Nonwords? Journal
of Experimental Psychology: Learning, Memory, and Cognition 28:
2933.
Scholes, Robert J.
1966 Phonotactic Grammaticality. The Hague: Mouton.
Schweickert, Richard
1993 A multinomial processing tree model for degradation and redintegra-
tion in immediate recall: short-term memory. Memory and Cognition
21: 168175.
Selkirk, Elisabeth O.
1982 The Syllable. In: Harry van der Hulst and Norval Smith (eds.),
The Structure of Phonological Representations, Part II, 337383,
Dordrecht: Foris.
Service, Elisabet and Viljo Kohonen
1995 Is the relation between phonological memory and foreign language
learning accounted for by vocabulary acquisition? Applied Psycho-
linguistics 16: 155172.
Smolensky, Paul
1996 The Initial State and Richness of the Base in Optimality Theory.
1541196. Retrieved from http://roa.rutgers.edu/les/154-1196/roa-
154-smolensky-2.pdf.
Sternberg, Saul
1966 High speed scanning in human memory. Science 5: 653654.
Storkel, Holly
2001 Learning new words: Phonotactic probability in language develop-
ment. Journal of Speech, Language, and Hearing Research 44:
13211337.
Storkel, Holly, Jonna Armbrster and Tiffany P. Hogan
2006 Differentiating phonotacic probability and neighborhood density in
adult word learning. Journal of Speech, Language, and Hearing
Research 49: 11751192.
Storkel, Holly and Margaret A. Rogers
2000 The effect of probabilistic phonotactics on lexical acquisition. Clinical
Linguistics and Phonetics 14: 407425.
Thorn, Annabel S. C. and Cliff R. Frankish
2005 Long-term knowledge effects on serial recall of nonwords are not
exclusively lexical. Journal of Experimental Psychology: Learning,
Memory and Cognition 314: 729735.
Treiman, Rebecca and Catalina Danis

1988 Short-term memory errors for spoken syllables are affected by the
linguistic structure of the syllables. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition 14: 145152.
Vitevitch, Michael S. and Paul A. Luce
1998 When words compete: Levels of processing in perception of spoken
words. Psychological Science 94: 325329.
1999 Probabilistic Phonotactics and Neighbourhood Activation in Spoken
Word Recognition. Journal of Memory and Language 40: 374408.
2005 Increases in phonotactic probability facilitate spoken nonword repe-
tition. Journal of Memory and Language 52: 193204.
Werker, Janet F. and John S. Logan
1985 Cross-language evidence for three factors in speech perception.
Perception and Psychophysics 371: 3544.
Appendix A
Target Stimuli used in Experiment 1 and 2
High biphone frequency nonwords.
CVC: be:l, bx, de:f, de:k, fo:m, fo:t, xa:k, hs, ja:t, kx, la:m, me:f, ml,
ne:k, ra:l, ra:n, rn, rf, ro:n, s, tx, t, vo:k
CVCC: be:ls, be:rk, bxt, de:ft, de:ks, xa:kt, hst, ja:rt, krk, li:nt, li:ts, me:ft,
mls, ne:ks, rls, rns, rxt, sk, srt, txt, tkt, trm, vo:kt
CCVC: bls, brx, brl, brn, bro:n, dre:k, fro:m, fro:n, fro:t, xra:k, xro:n,
klx, klr, krx, krf, pra:n, prn, sla:m, sla:r, tra:l, tr, trn, twl
CCVCC: blst, ble:rk, brlt, brnt, dre:ks, frls, frns, fro:ns, xra:kt, klrm, kli:nt,
krxt, pli:ts, prk, sla:rs, stk, strt, tra:ls, trk, trxt, trrm, twkt, twlt
Low biphone frequency nonwords.
CVC: br, dyl, hyl, ki:, kux, kym, lux, lum, lt, mp, myt, ryk, sum, sur, tr,
v, wi:, wur, za, zus, zp, zyx, zyl
CVCC: brx, dylk, hylm, ki:s, kus, kymp, lump, lurx, lmp, lmt, mps,
nurn, sums, surp, trf, vi:t, vt, vrf, wumt, wut, wurn, zat, zylm
CCVC: dwi:, dwu, dwur, dw, dwyw, ux, xlun, xlt, knux, knm, kn,
knp, kwi:, smyx, smyt, snum, vlum, vlu, wryk, zw, zwp, zwyx, zwyl
CCVCC: dwi:t, dwumt, dwut, dwurx, dwt, dwrf, dwywt, xlump, xlut,
xlmp, knms, kns, knps, snump, snurn, snurp, vlut, vlmt, xlps, vlyms,
vlps, dwyt, xlumt
Appendix B
Multilevel Model
Reaction times are nested both within individuals and within stimuli. Hence, a
multilevel model is appropriate for the analysis of the results. Failing to take
the different variance components into account will result in an underestima-
tion of the variance and hence the testing statistics will be too optimistic (the
null-hypothesis is rejected although the data in fact do not support this conclu-
sion). In order to test the hypothesis we dene several dummy variables,
which are turned on if a response is observed in the respective condition and
are turned off otherwise. Let Y ij be the response on item i i 1; 2; . . . ; I
of individual j j 1; 2; . . . ; J, and High biphone frequency(ij ), CCVC(ij )
CVCC(ij ) and CCVCC(ij ) be dummy variables indicating whether the item is
a High biphone frequency nonword with CCVC, CVCC or CCVCC structure
respectively. The interaction between biphone frequency and syllable type can
be estimated by dening combinations of the biphone frequency dummy and
the syllable type dummies. Analogue to analysis of variance a saturated model
with both main and interaction effects can be written as:
Y ij CONSTANT 1 HIij 2 CCVCij

3 CVCCij 4 CCVCCij
HIij 5 CCVCij 6 CVCCij 7 CCVCCij
eij ui 0 u0 j
The model above consists of two parts: a xed and a random part (between
square brackets). In the xed part the constant (i.e. the intercept) represents the
mean of Low biphone frequency CVC items, and the other effects represent
deviations from this average. So, the reaction time to High biphone frequency
CVC items is (Constant + 1), and the average of low frequency CCVC items
is (Constant + 2), etc.
In the random part three residual scores are dened: e(ij ), ui0 and u0j . The
last term (u0j) represents the deviation of the average reaction time of individ-
ual j from the grand mean, ui0 represents the deviation of item i from the
grand mean, and nally e(ij ) indicates the residual score of individual j and
item i. We assume that these residual scores are normally distributed with an
expected value of 0.0 and a variance of S2e , S2ui and S2uj respectively.
Appendix C
Results of Experiment 1
Estimates of Fixed Effects
Parameter Estimate Standard Error Signicance
Intercept (low biphone frequency CVC) 1103.75 44.58 .000
high biphone frequency 14.07 28.84 ns
CCVCC 117.65 28.93 .000
CCVC 78.61 28.91 .007
CVCC 69.07 28.87 .018
high * CCVCC 102.39 40.79 .013
high * CCVC 88.75 40.79 .031
high * CVCC 72.47 40.77 .077
Estimates random Parameters
S2uj (items) 6926.49 1,019.25 <0.001
S2uj (participants) 47,118.08 12,486.48 <0.001
S2e(ij) (residual) 72,772.29 1,473.28 <0.001
Appendix D
Results of Experiment 2
Estimates of Fixed Effects
Intercept (low biphone frequency CVC) 1038.53 29.20 .000
high biphone frequency 38.10 15.17 .012
CCVCC 75.67 15.27 .000
CCVC 79.61 20.19 .000
CVCC 99.09 21.44 .000
high biphone frequency * CCVCC 48.17 20.60 .019
high biphone frequency * CCVC 6.04 28.48 ns
high biphone frequency * CVCC 40.29 29.65 ns
Estimates random Parameters

S2e(ij) (residual) 94,929.01 1,344.73 <0.001
S2uj (participants) 23,922.15 5,992.15 <0.001
S2uj (items) 6,345.37 912.50 <0.001
Acquiring and avoiding phonological complexity in
SLI vs. typical development of French: The case of
consonant clusters
Sandrine Ferre, Laurice Tuller, Eva Sizaret and

Marie-Anne Barthez
Abstract
The notion of complexity is a central issue in phonology. In acquisition studies as
well as formal analyses, consonant clusters are widely considered as being an area of
particular complexity. Based on the idea that complex areas might be revealed by pro-
duction errors and a later age of acquisition in speakers with more fragile phonological
representations, the present study analyzes consonant productions of children and
adolescents with Specic Language Impairment (SLI). The productions of children
with SLI will be compared to those of French typically-developing children with the
aim to gain a better understanding of the causes and the origin of the difculties of
the former. Our approach assumes that production data reect the development of
childrens phonological competence, in particular involving issues of syllable structure
complexity. Even though not unrelated, the phonetic effects on phonological develop-
ment will be left aside in the present contribution. Our hypothesis is that consonant
clusters are phonologically complex at the syllabic level, and therefore create problems
for speakers with SLI. Our results provide support for this hypothesis and show that
some syllabic positions emphasize the complexity created by consonant clusters.
1. How to dene phonological complexity?
A large number of studies have addressed this question, yet there has been no
unanimous answer to date. Several ways of computing complexity have been
proposed at each level of phonological analysis, in particular with respect to
the internal structure of segments on the phonemic level (for a recent review,
see Pellegrino 2009). Within the framework of the theory of elements, govern-
ment phonology (Kaye, Lowenstamm and Vergnaud 1990), for example,
provides a metric of phonemic complexity based on the number of elements
that constitute segments. Ever since Trubetzkoy (1931), complexity is further-
more frequently associated with the notion of markedness (Hayes and Steriade
2004, among others), while from a more phonetic point of view, Lindblom
and Maddieson (1988) propose a classication of consonantal systems based
on their articulatory complexity (e.g. the use of a secondary articulation)
286 Sandrine Ferr, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
into three categories: simple, elaborated and complex systems. Still other
approaches base phonological complexity on the frequency of occurrence of
a phoneme (Zipf 1935).
Beside answers grounded in theoretical considerations, it is equally possible
to nd arguments for dening complexity in acquisition data. Assuming that
the linguistic system of the child grows in complexity, or rather becomes
more complete when developing ([. . .] and then reorganizing the system to
encompass more data, resulting in more complex structure or more complete
or accurate representation Vihman 1996: 4) complexity could be revealed in
terms of age of acquisition. The later a phoneme is acquired, the more com-
plex it presumably would be (Winitz 1969; Ingram 1989; Gierut et al. 1996).
However, this hypothesis raises further questions, in particular about the exact
denition of the age of acquisition of a phoneme. When is a phoneme really
acquired? Many acquisition studies have shown that the position of a segment
in the word plays a major role in the acquisition of phonemes: for example, a
phoneme is acquired earlier in word-initial position than in word-nal position
(see Kirk and Demuth (2005) for English; Lle and Prinz (1996) for German;
or Demuth and Kehoe (2006) for French). Consequently, which criteria should
be used to determine the age of acquisition of a phoneme, and its degree of
phonological complexity?
Phonological complexity is not only inherent in segments, but also depends
on the syllabic structure to which individual phonemes are associated. In this
vain, Cyran (2003) proposed that phonological complexity can occur simulta-
neously on two levels of structure: a complex syllable can contain complex
segments.
The denition of syllabic complexity is somewhat more consensual and
mainly based on the number of constitutive elements of syllabic constituents,
in particular on the number of consonants at the beginning and at the end of
the syllable (e.g. Maddieson 2006). Thus, a CCVC syllable is considered to be
more complex than a CV syllable.
Crucial within this context is the denition of the syllable itself. Different
syllabic frameworks provide greater or fewer constraints on the associations
between segments and syllabic constituents. Taking the example of the coda,
propositions go from the complete absence of the coda constituent (Lowen-
stamm 1996) to a segmental association to this constituent without restrictions
(Blevins 1995) through syllabic models that impose certain conditions on
which segments are able to be associated with the coda (Angoujard 1997).
Given this denition of the syllable, the degree of syllabic complexity highly
depends on the constraints of association. Moreover, additional variables, such
Acquiring and avoiding phonological complexity in SLI 287
as stress, can inuence the quality of the syllable, and therefore its complex
structure.
1.1. French consonant clusters: a typology

Standard French allows consonant sequences that Maddieson (2006) lists
under complex structures. Looking rst at the distribution of French con-
sonants without reference to a specic syllabic framework, it can be noted
that one to three consonants can occur in initial position of a word, and one
to four consonants in medial and nal positions.
Table 1. Sequences of French consonants.

Initial position Medial Position Final Position
C /pa/ pas step /apa/ appt lure /kap/ cap Cape
CC /pi/ pris catched /epi/ pris loving /kap/ cpre caper
CCC /sti/ strie scratch /espi/ esprit spirit /myskl/ muscle muscle
CCCC xxx /eksta/ extra extra /dkst/ dextre dextral
Importantly, these consonant sequences are frequently made up of obstruants

or fricatives appearing together with sonorants, glides and/or the unvoiced
alveolar fricative /s/. For a list of possible and missing congurations, see
Lyche (1993) and Fry (2001), among others.
The phonotactic constraints of French, therefore, allow us to specify assump-
tions about licensed syllabic structures in French, and to deduce the phonolog-
ical constraints that weigh on associations between segments and syllables.
The regularities in consonantal sequences provide strong support for the assump-
tion that speakers of a given language possess certain phonological representa-
tions. The *OLj constraint formulated by Kaye and Lowenstamm (1984) is the
most simple example of this link: in French, there is no obstruant-liquid-/j/
sequence, like */tje/, because /j/ must be associated to a consonantal position,
and because triple onsets are forbidden. In the example */tje/, since /j/ is
consonantal in French, it must be associated with the onset. However, the
two possible associations to the onset are already lled by the obstruant /t/
and the liquid //. Therefore, this sequence is simply not licensed in French.
The solution is to insert a vowel in the cluster and thus to link /j/ to an inter-
vocalic onset: /ti.je/. This well-known example illustrates the importance of
the distribution of consonant sequences in terms of syllabic constituents.
1.2. Consonantal associations to the syllable
1.2.1. The problem

The syllabication of consonant sequences not only depends on the chosen
theoretical framework, but for most syllabic models, also on the position of
the cluster in a word, or rather, on the position of the cluster in relation to a
word boundary.
The most current denition of the syllable follows the principle that a
syllable is organized around an instantiated nucleus, and that consonant syllab-
ication directly depends on this nucleus. Thus, according to Blevins (1995),
each sonority peak denes a unique syllable. This view entails that the French
word /pat/ pte pastry has one syllable, as would have the words with com-
plex nal sequences, such as /stikt/ strict strict or /dkstr/ dextre dextral1.
Other approaches dene the syllable as a sequence of increasing sonority and
the distribution of consonants follows from this denition. As a consequence,
word-nal consonants and word-nal clusters are not analyzed as appearing
at the end of the syllable, but rather at the beginning of a following syllable.
Harris and Gussmann (1998) explain in this way why complex clusters are
never located at the beginning of a word, but always in the middle or at the
end of a word. The notion of syllabic complexity depends therefore entirely
on the theoretical framework.
Figure 1. Syllabication of nal consonants according to Harris and Gussmann

(1998); O stands for onset, N for nucleus of the syllable. The nal consonant
is always associated to the onset of an empty syllable.
1.2.2. An intermediate proposition: the rhythmic grid

Angoujard (1997) proposed an intermediate model, the rhythmic grid, which
is also based on a sonority principle, but which keeps an optional and con-
strained position at the end of the syllable. The grid has three positions (called
1. Note here that // in French is not syllabic and should be associated to a consonan-
tal syllabic position.
positions 1, 2, 3). Position 1 is a less sonorous position, and is often associated

to a consonant. Position 2 is a highly sonorous position, and is often associated
to a vowel. Position 3 is rather equivalent to a coda with restrictions on the
segments that it allows. The third position of the rhythmic grid can only be
associated with a segment of equal sonority (i.e. a long vowel, as in (1a)),
with a segment having a sonority lower than that of the segment associated
with the sonority peak, but no lower than that of sonorants (as in (1b)), or
with a segment whose sonority is equal to that of the segment associated with
the following position 1 (i.e. geminate consonants as in (1c)):
(1) a. /fato/ (Italian) destiny
b. /pa/ (French) part
c. /fattoe/ (Italian) factor
Figure 2. Association to position 3: Italian /fato/ destiny; French /pa/ part; Italian
/fattoe/ factor.
These constraints imply that an obstruant can never be associated to posi-

tion 3, and consequently a nal consonant which is a sonorant and a nal
consonant which is an obstruant will be analyzed differently.
Figure 3. Example of segment-to-syllable associations: /pa/ part vs. /pat/ pastry.
Every sequence of segments can be associated to a sequence of positions so

that these positions respect the rhythmic grid. Positions can also be empty, i.e.
not associated to a segment. In gure 4 below, three positions are empty: the
initial position and the medial and nal peaks (for a more detailed analysis of
empty positions, see Angoujard 1997).
Figure 4. Segment-to-syllable association: /akt/ act.
Due to their sonority, some consonants have a special status in this frame-
work. Sonorants, glides and the alveolar fricative /s/ are indeed the unique
consonants that are able to be associated to position 3, and to be the second
part of a branching constituent. As noted previously, these consonants are the
constitutive elements of consonant clusters in French. When associated to
these specic syllabic constituents (specic in the sense that they are not
permitted in all languages), these sounds increase the level of phonological
complexity. Thus, in order to be able to produce /sti/, various constraints rst
have to be acquired: a constraint that allows /s/ to be linked to the third posi-
tion of an empty grid2, and a constraint that allows the rst position to branch
with the sonorant // (see gure 5).
Figure 5. Segment-to-syllable association: /sti/ scratch.
The syllabication of /dkst/ dextre dextral entails an even higher degree

of complexity because it requires a constraint allowing an empty peak between
two instantiated hollows and a further constraint allowing an empty nal peak
after a branching position 1 (gure 6). The addition of constraints is then
supposed to create complexity. Therefore /sti/ is more complex than /ti/,
but not as complex as /dkst/.
Figure 6. Segment-to-syllable association: /dkst/ dextral.
2. See Angoujard (1997) for a discussion on /s/ particularities.

The hypotheses outlined above do not challenge Maddiesons classication

(2006), which links the number of elements to the degree of complexity. It is
reasonable to assume that as the number of elements increases, the number
of possible associations also increases. Our hypothesis is that phonological
complexity is indeed related to the number of elements, but only indirectly
so. Rather, the origin of phonological complexity lies in the association con-
straints between segments and syllabic structure.
This hypothesis allows us to relate segmental complexity to syllabic com-
plexity. A consonant licensed to ll specic positions, such as position 3, or
such as branching structures, would be considered as more phonologically
complex than a consonant that does not have this capacity. This difference in
syllabic status between different consonants, in turn, is related to production
errors in typical or atypical speech development.
1.3. Phonological Complexity and Specic Language Impairment

One of our hypotheses is that production errors and avoidances pinpoint to
areas of phonological complexity. In addition, the processes, which are used
to circumvent those difculties, are rich sources of information on the degree
of complexity used by speakers.
One central issue is whether production problems, and thus in our view
areas of phonological complexity, are the same in early typical and in atypical
speech development. If this were the case we would be able to establish a
link between acquisition processes and the underlying degree of phonological
complexity.
1.3.1. Consonant clusters as a source of production difculty

In many languages, consonant clusters are a well-known source of phonologi-
cal complexity for speakers with SLI. Most studies to date were concerned
with English, and, beside clusters, investigated further sources of complexity,
such as lexical stress (Gallon, Harris and van der Lely 2007; Marshall et al.
2002; Marshall, Harris and van der Lely 2003) or syllabic position in the foot
(Ingram 1981; Sahlen et al. 1999; Bortolini and Leonard 2000). These studies
have shown that speakers with SLI tend to simplify complex structures by
either reducing or avoiding consonantal clusters, or, alternatively, by deleting
entire syllables, in particular weak syllables. Gallon, Harris and van der Lely
(2007) reported that children with SLI have more difculty producing items
consisting of a complex syllable than they do with items which are merely
longer (multisyllabic). The authors also observed that the degree of markedness
increases error rate recalling the link between markedness and phonological
complexity mentioned above. If the degree of markedness of a structure deter-

mines its degree of phonological complexity, then the results by Gallon and
colleagues (2007) show that when complexity increases, correct production
rate decreases.
The central question in the present paper is why do speakers with SLI
reduce clusters? Or, in other words, in what way are consonant clusters complex
for them? Two hypotheses are plausible: 1) a decit of short-term memory
understood as a limited processing capacity and/or a temporal processing
decit (Marshall et al. 2002: 44); 2) a difculty linked to the production of
several adjacent elements in a cluster sequence.
The results of the Marshall et al. studies (2002, 2003) seem to reject the
rst hypothesis. Their work suggests that young English speakers with SLI
tend to reduce the cluster to the unmarked CV type, even in monosyllabic
words. Consequently, a short-term memory decit offers no explanation because
even short sequences seem to be difcult for these speakers.
The second hypothesis has been challenged by Orsolini et al. (2001). They
observed that Italian children with SLI are sensitive to the syllabic structure in
their use of geminates in order to avoid the production of clusters. As gure 7
illustrates, these children neither reduced the size of the sequence nor omitted
any consonants, but adapted the sequence by changing the nature of the con-
sonant. In other words, they chose a consonant that tted the syllabic position
better.
Figure 7. Adaptation of phonemic sequence to syllabic structure by Italian children

with SLI: /pota/ ! /potta/.
As can be seen, the syllabic structure has not been modied rather it is the
association constraint which has been manipulated. We believe that this is
indicative of the real nature of phonological complexity, which resides in the
constraints associating segments and syllables, rather than in the mere exis-
tence of consonantal sequences.
2. Aim of the study
Previous studies looked at languages, such as English, which have additional

variables of complexity, such as lexical stress. In French, stress is not dis-
tinctive, and, as pointed out above, certain consonants are licensed in most
syllabic structures. In the present study, we will look at how these structures
fare, on the one hand, in speakers whose phonology is still emerging (with
the goal of identifying which structures are acquired later, and thus to nd evi-
dence for the link between phonological complexity and age of acquisition),
and, on the other hand, in speakers for whom phonology is a fragile domain,
and thus who ought to be particularly sensitive to phonological complexity.
We believe that, beyond the consonant cluster itself, production problems
result from the ambiguity created by the increase in the number of possible
associations between segments and syllable structure. Our specic goal is to
provide answers for the following three questions: Does the increase in the
number of consonants in a cluster increase phonological complexity? Do
children with SLI have difculties to overcome this increasing complexity?
Do syllabic positions play a role in the difculty of processing consonant
clusters? In a second step we aim to test the ability of the rhythmic grid pro-
posed by Angoujard (1997) to account for the results.
3. Method
3.1. Material
An experimental test, the Syllabic Structure & Segments (SSS) Test, was
designed to investigate the acquisition of consonant clusters in typical and
atypical speech development. The test is a repetition task and an adaptation
for French of the Test of Phonological Structure (TOPhS) created by van der
Lely and Harris (1999). The aim was to replicate for French the results
obtained with this test for English-speaking children with SLI (Marshall,
Harris and van der Lely 2003). As the phonological structure of French is
very different from the structure of English, the SSS test has different properties
than the TOPhS.
The SSS Test tests the production of specic segments: sonorants // and /l/,
glides /j, w, / and the unvoiced alveolar fricative /s/. These segments will be
tested in initial, medial and nal word positions in one- to three-consonant
sequences. This linear typology has been chosen in order to avoid the inuence
of a specic syllabic model in the construction of the test.
Both words and non-words are used in the SSS Test in order to evaluate
whether, and to what extent, children rely on the lexicon. It is well known
that children with SLI in particular tend to seek support in their lexical know-
ledge to compensate their decit (Marshall et al. 2002; Maillart and Parisse
2006). Thus, using non-words in a repetition task is a way of testing phono-
logical structures without any interference of lexical skills.
Moreover, the stimuli (real and non-words) were no longer than two syllables
to restrict any potential impact of the working memory on the repetition task,
as participants with SLI are known to show difculties in processing long
words (Gathercole and Baddeley 1990). Finally, only obstruents were used in
the constitution of non-words and in the choice of real words to avoid any
inuence of late acquisition of fricatives. All items begin with a consonant
and contain one to two vowels. In non-words, the vowel /a/ was primarily
used. Some non-words also contains /e/ or /i/ given the phonotactic constraints
of the glide // in French (Pourin 2003).
It was not possible to cross all variables due to the phonotactic constraints
of French, especially in the constitution of clusters with /w/ and //. For exam-
ple, these glides never occur in nal two- or three-consonant clusters: */pak/,
*/paskw/, or even in nal single position for //: */pa/. Note also that
level 2 of the cluster size always consists in an obstruent + target consonant
sequence (/pati/), and never in a target consonant + obstruent sequence
(/pati/). The latter sequence is only tested in nal position (level 1 + /pat/
vs. level 2 /pat/).
Table 2. Summary of the variables used in the construction of the SSS Test.
word position Initial Intervocalic Final
Size of cluster 1, 2, 3 1, 2, 3 1, 1+, 2, 3
target consonant , l, j, w, , l, j, w, , l, j, w
Type of item real word, real word, real word,
non-word non-word non-word
In this way, 96 test items were created. Some examples of test items are
given in table 3.
Table 3. Examples of test items of the SSS Test.

word size of target type of produced item
position cluster consonant item
initial 2 // non-word /pa/
intervocalic 3 // non-word /pasta/
intervocalic 3 // real word /gasto/ gastro stomach ew
nal 1+ /s/ real word /pist/ piste track
nal 1 /j/ non-word /gaj/
intervocalic 2 /l/ real word /kuple/ coupler to link
3.2. Participants
28 children with SLI3 and 30 typically-developing children participated in the
experiment. Participants with SLI were divided into two groups. The SLI-7-
10 group consisted of 9 children aged 7 to 10 ( = 8;9 = 1;5). The SLI-
11-16 group consisted of 19 adolescents aged 11 to 16 ( = 12;5 = 1;2).
All SLI participants were diagnosed between ages 6 and 9 in the same neuro-
pediatric service at the university hospital in Tours. To ensure the specic
character of their linguistic problem, an audiogram had attested normal hear-
ing skills and a neuropsychological evaluation had tested the level of non-
verbal skills (all subjects showed a Performance IQ > 85). The clinical evalua-
tion led to a pedopsychiatric consultation if deemed necessary. The severity of
the linguistic problem was classied according to a speech-language therapy
evaluation, with the pathology threshold set at 1.65 SD. Each language
domain was evaluated: articulation, phonological production, active lexicon
and syntactic production by means of the test Epreuves pour lExamen du
Langage (EEL, Chevrie-Muller, Simon and Decante 1981), lexical compre-
hension by means of the Test de Vocabulaire Actif et Passif (TVAP, Deltour
and Hupkens 1980), and morphosyntactical comprehension by means of the
Northwestern Syntax Screening Test (NSST, Lee 1971).
The division between the two SLI groups coincides with the division in the
French school system between primary and middle school, the latter beginning
at age 11. At this point in time major changes occur in the way language is
used at school, and thus in the way speech therapy is managed. This age also
corresponds to the entry into adolescence with prominent developmental
changes for the young speakers. Therefore children and adolescents were con-
sidered separately.
Two groups of typically-developing children were investigated: the TD-3
group included 14 typically-developing 3-year-olds ( = 3;3 = 0;5), and the
TD-4 group included 16 typically-developing 4-year-olds ( = 4;6 = 0;2).
We also analyzed results from two further control groups of children aged 7
and 11. These children were all at ceiling performance in the SSS test, and
will therefore not be discussed further.
The study compares two types of development: typically-developing children
and children with specic language impairment (SLI), each of which is
3. Children with SLI show no intellectual, hearing, social or affective decit, no brain
injury and no developmental disorder, but they show a strong decit of verbal
capacities, signicant in light of established standards for their age (Grard 1993;
Leonard 1998).
represented by two age sub-groups, TD-3, TD-4, SLI 7-10 and SLI 11-16,
respectively. The comparison seems to be justied by the fact that typical
phonological development of consonants ends at around the age of 5. Scores
of the SLI participants should not correlate with age, as the development is
supposed to end at the age of 7. In order to verify this assumption, two control
groups of children aged 7 and 11 were also tested and showed results at ceil-
ing performance. This suggests that the phonological development of children
with SLI resembles more closely that of younger children in whom phonol-
ogical development is still in progress, and their errors should therefore be
comparable to children aged 3 to 4. Furthermore, as typical phonological
development between age 3;0 to 4;11 is exceedingly rapid, we deemed it
prudent to study these children in groups spanning no more than 12 months.
Based on these considerations we therefore compare groups of participants
with SLI with a relatively wide age range to groups of TD participants with a
narrow age range.
A Shapiro-Wilk normality test was conducted in order to test the normal
distribution of our two main groups (TD and SLI) on 18 main variables,
including over-all test success rate. The distribution is considered as normal
if p > .05. For the SLI group the distribution was normal for half of the tested
variables, including success rate on the test (W = .95783, p = .30937), while
for the TD group, the distribution was only normal for three variables, and
non-normal for the success rate on the test (W = .80047, p = .00007). We there-
fore decided to use non-parametric tests to conduct the statistical analysis of
our results.
To verify the validity of our sub-groups based on developmental con-
siderations, we conducted a Kruskal-Wallis ANOVA. The Kruskal-Wallis test
showed a prominent group effect on the variable test success rate (H = 23.89
p = .000). The effect was found on every tested variable except three, conrm-
ing the importance of the sub-groups in the two types of development.
To further clarify these outcomes by showing an effect between the two
types of development (TD versus SLI), a Mann-Whitney Test was conducted
on the variable development. Results showed a signicant difference between
TD and SLI groups (U = 294, Z = 1.963167, p = 0.04).
Finally, a Spearman R-Test was carried out to evaluate the correlation
between age and test success rate, and thus validate our comparison of
single-year age ranges for the two groups of TD children with the groups of
wide-age ranges for children with SLI. Results showed no correlation between
test success rate and age in children with SLI, (r = 0.205988), but a positive
correlation in the TD group (r = 0.617706). This implies that for the typically-
developing participants age is a determining factor for success rate, while, in
conformity with our hypotheses, age is not important for participants with SLI.
For statistical analyses, each groups performance was compared using a

MannWhitney nonparametric test for independent samples. The Wilcoxon
nonparametric test was used for within-group analysis.
3.3. Procedure
The participants were tested individually in a quiet room and were asked to
repeat each item after the experimenter. The subjects were told at the begin-
ning of the session that some of the items were non-words, but knew that
they had to repeat every item regardless. In order to verify whether the task
had been understood correctly, ve practice items were presented before the
start of the experiment. Each item was presented once, and in the same order
for all participants. Responses were recorded on a Zoom H4 audio-recorder,
and transcribed using broad phonetic transcription in IPA. Each transcription
was veried by another expert transcriber. The responses were coded in CHAT
format (CHILDES system, MacWhinney 2000). Each sample was hand coded
by a rst coder and subsequently veried by a second expert coder. Productions
were coded and extracted according to each variable targeted (i.e. position in the
word, target consonant, size of the cluster, and word vs. non-word) on a target
line in parallel to a produced line. Errors were coded on a specic error line.
4. Results and discussion
4.1. Single consonants

In all three positions, single consonants were produced correctly by all four
groups: overall correct production rates are all above 80% suggesting that the
consonants did not cause any prominent difculties in these positions (see g. 8).
However, all groups show a slight decrease of success rate in nal position,
which is most noticeable in young children with SLI (85% success rate as
compared to over 90% for the other groups). The difference between groups
is not signicant (in nal 1: SLI 7-10 vs. TD-3, U = 44.5, Z = 1.19, p = 0.23;
vs. TD-4, U = 48, Z = 1.4, p = 0.16; vs. SLI 11-16, U = 53.5, Z = 1.75,
p = 0.08). The slightly fewer correct productions in nal word position suggest
that this position might be one of the critical areas in consonant acquisition
and support the idea that it is not the number of elements but the constraints
on association that are the principle source of phonological complexity.
Figure 8. Success rate in the production of single consonants in three positions: Initial
1 versus Medial 1 versus Final 1 (as % of total of correct production per
type of position for each group of speakers).
4.2. Consonant clusters

In the following success rate will be analyzed according to the number of
elements (groups of one to three consonants) and according to the position
of the cluster in the word (initial, intervocalic and nal position).
4.2.1. Clusters in initial and intervocalic position

Figure 9 shows the percentage of correct productions of two- and three con-
sonant sequences in word-initial and intervocalic position. As can be observed,
the results are much more heterogeneous than those for single consonants. 4-
year-old children display a constant success rate of over 90% without any
signicant differences across positions. The speakers of the other groups pro-
duced clusters in intervocalic position more accurately than those in initial
position, and two element clusters more accurately than triple clusters. The
cluster with the highest error rate was the cluster with three consonants in ini-
tial position produced by 3-year-old children (77%, SD 13.7), and by children
(61%, SD 19.1) and adolescents with SLI (76%, SD 20.6) with a signicant
decrease of performance to below 80%. Correct productions of triple clusters
in intervocalic position remain over 80% (3-year-old children 84% (SD 15.8),
children with SLI 86.4% (SD 12.4), adolescents with SLI 89.5%, (SD 11.4)).
Figure 9. Success Rate for Initial and Intervocalic Position (as % of total of correct
production per type of position for each group of speakers).
Note that the standard deviation also increased with an increase in the
number of consonants, in particular for participants with SLI. Heterogeneity is a
well-known factor of groups with SLI and especially surfaces in complex areas.
Inter-group comparison shows that children aged 3 behave similarly to par-
ticipants with SLI and no signicant differences between them and the two
SLI groups could be found, whereas children aged 4 performed signicantly
better than the SLI 7-10 group (p < 0.05 for Intervocalic 3 and p < 0.0001 for
Initial 3). Thus, speakers with SLI resemble more closely typically-developing
3-year-olds than typically developing 4-year-olds when they are confronted
with the production of more complex consonant structures.
4.2.2. Clusters in nal position

Looking at cluster productions in nal position, it can be noticed that as the
number of elements increases success rate decreases, especially in the most
fragile group (i.e. the SLI 7-10 group).
While the TD-4 group obtained a score above 85% for each type of nal
cluster, with low standard deviations (SD 5.32, 5.96, 6.78, 9.65 for Final 1
(/pa/, Final + /pat/, Final 2 /pat/ and Final 3 /past/, respectively), the three
other groups had lower scores with higher standard deviations. All three
groups show similar patterns and the differences for each variable are not
signicant between any group comparison (i.e. TD-3 vs. SLI 7-10; vs. SLI 11-
16, and SLI 7-10 vs. SLI 11-16). However, differences between simple nal
position and triple nal position are signicant for each group (TD-3: Z = 3.11,
p < 0.005; TD-4: Z = 2.22, p < 0.05; SLI 7-10: Z = 2.1, p < 0.05; SLI 11-16:
Z = 2.23, p < 0.05).
Figure 10. Success Rate for Final Clusters (as % of total of correct productions per
type of position for each group of speakers).
An interesting point to note are the productions of nal clusters with the
sonorous segment preceding the obstruant (as in /pat/, Final 1+) as compared
to those in which the sonorant is in second position (as in /pat/ Final 2) by
the group SLI 7-10. This group produced these two consonant sequences in
the same way (Final 1+, 69.8%, SD 38.7; Final 2, 68.5%, SD 31.7) without
any signicant difference (Z = 1.2, p = 0.2). Note also that standard deviations
show a great heterogeneity in the way in which participants treated these
clusters. TD-3 produced clusters like /pat/ (86%, SD 17.7) better than clusters
like /pat/ (81%, SD 18.5) (Z = 1.09, p = 0.27). On the contrary, adolescents
with SLI produced nal clusters like /pat/ (86.5%, SD 21.5) signicantly better
than those like /pat/ (75.4%, SD 21.8; Z = 2.2, p < 0.05).
These results support the hypothesis that the origin of phonological com-
plexity lies beyond the mere number of elements in the cluster. In both types
of sequences, the cluster is constituted by two consonants, a sonorant and an
obstruant, but the difference between the sequences is the order of the con-
sonants in the cluster, and (therefore) the way they are associated to the syllable
(gure 11).
Figure 11. Syllabication of nal sequences: sonorant-obstruant and obstruant-

sonorant seuqences.
Following the outline proposed by Angoujard, (1997), in /pat/ the sonorant

and the obstruant are linked to two different constituents and subsequently to
two different syllables whereas in /pat/, there is a branching structure with a
unique syllabic constituent. It thus seems that solving the syllabic association
of consonant clusters is still a problem for children aged three, in particular
when the sonorant precedes the obstruant. If we consider that syllable structure
constraints are not yet fully acquired at age three in typical development, and
that acquisition of these constraints is problematic in atypical development,
it can be assumed that a structurally ambiguous sequence such as /pat/ (Final
1+) is less easily resolved at age three because resolution requires the child to
know how to associate a consonant with a nal position. In contrast, the Final
2 structure would be less difcult because branching is not a problem at that
age as we saw in the previous results on double clusters in initial and inter-
vocalic position. In the context of language pathology, all of the syllabic posi-
tions have arguably been acquired, but the ambiguity created by the manifold
possibilities of association creates erroneous analyses. This ambiguity could
also explain the heterogeneity of the results in that heterogeneity increases as
children with SLI are faced with complex structures. Indeed, participants with
SLI show high standard deviations, a pattern found to a small degree in TD-3
and to an even lesser degree in TD-4.
4.3. Finding solutions to solve the complexity problem: compensatory

strategies
In the following, we present an analysis of compensatory strategies in order to
investigate what children do when they miss the target. These erroneous pro-
ductions give strong clues about the structure of phonological representations
and the acquired constraints at a given stage of acquisition. Moreover, we
assume that compensatory strategies are used precisely then when the phono-
logical complexity of a structure is too high.
Depending on age, and on whether speech development is typical or a-
typical, various strategies were used by the speakers. We found substitu-
tion processes (/kapa/ ! /kapja/), omissions (/kapa/ ! /kapa/), metathesis
(/kapa/ ! /kapa/), and further strategies to be analyzed below. The two
main strategies used by children of all groups were omission and substitution.
4.3.1. Omission and substitution of consonants

In general, errors consisted of omitting or substituting segments in the con-
sonant sequences. Figure 12 shows that the TD-3 group behaved slightly dif-
ferently from the other groups using omission (42%) more frequently than
Figure 12. Omission and substitution of target consonants (as % of the total of all
strategies used).
substitution (38%). Notice that the difference between 3-year-old children and
adolescents with SLI is highly signicant for the use of consonant omission
(U = 63, Z = 2.55, p < 0.01). Figure 12 suggests that substitution becomes
more frequent with age, since children at age 4 use it more than at age 3 or
than speakers with SLI (substitution: TD-4, 48%; SLI 7-10, 42.6%; SLI 11-
16, 44.75%; omission: TD-4, 26.8%; SLI 7-10, 26.7%; SLI 11-16, 23.1%).
This suggests that syllabic structures in children with SLI are complete (i.e.
have all the required syllabic constituents), but that the difculty is centered on
segmental association. Indeed, omitting a segment can be interpreted as the
absence of a syllabic constituent. On the other hand, substitution allows the
speaker to adapt the segmental content to the syllable according to the asso-
ciation constraints that weigh on these structures in a given stage of develop-
ment. For example, /j/ is very poorly produced in a context like /gajt/ by
all groups (success rate: TD-3, 14%; TD-4, 50%; SLI 7-10, 22%; SLI 11-16,
53%) and is regularly substituted by a vowel (/gajt/ ! /gat/). This type of
substitution is a reection of the ambiguity created by such a consonantal
sequence. The glide /j/ tends to behave like a consonant in French (Pourin
2003). Its association with position 3 is thus still problematic for the most
fragile speakers and suggests that the association constraint governing this
position is especially difcult to process.
Besides the predominant use of omission and substitution, children also
used other strategies, to a lesser extent, but with a remarkable frequency in
Figure 13. Other strategies used to replace target consonants (as % of total of all
other strategies).
speakers with SLI. Moreover, adolescents with SLI used signicantly more of
these other strategies than the TD-3 group (U = 78.5, Z = 1.98, p < 0.05)
and the TD-4 group (U = 80.5, Z = 2.37, p < 0.05). We take this to mean
that adolescents with SLI developed the use of those strategies in order to
produce meaning. In other words, these other strategies function as direct
compensatory strategies for their decit.
4.3.2. Other compensatory strategies

We identied ve other strategies: Long-distance metathesis (/taks/ ! /task/),
pauses (/kapa/ ! /kap#a/), addition of a segment (/kapa/ ! /kaspa/), chaotic
disturbance (/kapa/ ! / tadba/), and lexicalization (/pat/ ! /p/).
Figure 13 shows that the four groups mainly use segmental addition as
an alternative strategy to omission and substitution (percentage of addition
among all other strategies: TD-3, 77%; TD-4, 69.4%; SLI 7-10, 80%; SLI
11-16, 46.2%). The TD-3 group presents a similar pattern to SLI 7-10 children
with respect to their use of other strategies (no signicant inter-group differ-
ence on any type of other strategy) while the SLI 11-16 group exhibit a very
distinct pattern: they used lexicalization as much as addition (46.2% versus
42.9%, Z = 0.35, p = 0.72). Lexicalization is also used by children aged 3
and in young children with SLI, but less frequently (TD-3, 10%; SLI 7-10,
7.9%).
The use of chaotic disturbances in speakers with SLI (SLI 7-10, 7%; SLI
11-16, 5.5%) is an interesting phenomenon. This strategy is entirely missing in
TD children aged 4, and less prevalent in children aged 3 (3%), though there is
no signicant difference in relative frequency of disturbances in the TD-3 group
as compared to the two SLI groups (TD-3 vs. SLI 7-10, U = 46, Z = 1.13,
p = 0.19; vs. SLI 11-16, U = 102, Z = 1.34, p = 0.18), and no signicant differ-
ence between the two SLI groups themselves (SLI 7-10 vs. SLI 11-16, U = 80,
Z = 0.3, p = 0.76).
What is surprising about the use of these strategies is that they do not
necessarily simplify the syllabic structures. Indeed, if lexicalization is a means
to circumvent a lexical difculty (i.e. an unknown word), gliding from a non-
word to a word is not always a phonological simplication (e.g. /gas/ ! /kask/
helmet)4. As for structure disturbances, these consist of total disorganization
of the consonantal and syllabic structure: the consonantal sequence seems
to be produced at random, and the resulting form is sufciently far from the
target to make the phonological analysis of the processes at work extremely
difcult (/kapa/ ! /tadba/). These disturbances affect mostly consonants,
very rarely vowels. Moreover, as phonological complexity grows in the word,
the rate of productions of such phenomena increases as well (Carms 2007).
These strategies seem to reveal the difculty speakers with SLI have in dealing
with phonological complexity (see Ferr et al. (2010) for a more complete
analysis of those strategies, in particular lexicalization and chaotic disturbances).
Our results converge on the idea that difculty is located at the level of
the association constraints between the segmental and the syllabic lines. Con-
sequently, this difculty increases as the number of elements increases. The
analysis of compensatory strategies completes the study of consonant clusters:
success rate is closely linked to the ability to connect segments and syllables,
in other words, to the capacity to implement association constraints.
5. Conclusion
The aim of this study is to contribute to an understanding of the notion of

phonological complexity by examining the production of structures considered
4. In Ferr et al. (2010), we hypothesize that the signicant increase in the use of
lexicalization in older children with SLI is due to the fact that they are no longer
manipulating phonological structures with which they have difculties, but rather
are producing the most efcient lexical items they can. Lexicalization is therefore a
way to overcome the difculty for adolescents with SLI who then use their lexical
stock for this.
as complex in typical and atypical language development. We were interested

in knowing whether the increase in the number of consonants in a cluster
increases phonological complexity, whether children with SLI have more
difculties than typically-developing children to overcome that increase of
complexity, and nally, whether syllabic positions play a role in the difculty
of processing consonant cluster. Our results show that syllabic positions and
all tested phonemes were acquired when produced in isolation. However, asso-
ciation of the phoneme is not stable in all cases: Difculties subsist in nal
position and for branching structures, in all age groups and in both TD partici-
pants and participants with SLI.
Moreover, success rate decreases as the cluster length increases, especially
in speakers with fragile phonological representations. According to the age
and the type of development, various strategies were used to manage phono-
logical complexity: In the TD groups, omission was predominantly used, with
consonantal substitution becoming more frequent in 4-year-olds as compared
to 3-year-olds. This progression is not seen in the SLI groups. Adolescents
with SLI produce frequent disturbances of the consonantal organization. We
take this to indicate that the association constraints for consonants are not
well-mastered in speakers with SLI, resulting in random syllabication of
consonants. In conclusion, our results support the hypothesis that phonological
complexity lies above all in constraints that govern the association between
consonants and syllable structure, and not merely in the size of the cluster.
References
Angoujard, Jean-Pierre
1997 Thorie de la syllabe. Paris: CNRS ditions.
Blevins, Juliette
1995 The syllable in phonological theory. In John Goldsmith (ed.), The
handbook of phonological theory. Oxford: Blackwell, 245306.
Bortolini, Umberta and Lawrence Leonard
2000 Phonology and children with specic language impairment: status of
structural constraints in two languages. Journal of Communication
Disorders 33:2, 131150.
Carms, E.
2007 Traitement de la complexit phonologique chez les adolescents dys-
phasiques. Master 2 dissertation; Research in Cognition and Devel-
opment. University of Franois-Rabelais, Tours.
Chevrie-Muller, Claude, A.M. Simon and P. Decante
1981 Epreuves pour lexamen du langage (EPEL). Paris: Editions du
Centre de Psychologie Applique.
Cyran, Eugeniusz
2003 Complexity Scales and Licensing Strength in Phonology. Lublin:
Wydawnictwo KUL.
Deltour, Jean-Jacques and Dominique Hupkens
1980 Test de vocabulaire actif et passif pour enfants de 5 8 ans (TVAP
5-8). Braine-le-Chteau : Editions de lApplication des Techniques
Modernes (ATM).
Demuth, Katherine and Margaret Kehoe
2006 The acquisition of word-nal clusters in French. Journal of Catalan
Linguistics 5, 5981.
Ferr, Sandrine, Laurice Tuller, Anne-Gaelle Piller and Marie-Anne Barthez
2010 Strategies of avoidance in (a)typical development of French. In L.
Dominguez and P. Guijarres-Fuentes (eds.), Selected proceedings of
the Romance Turn III, Cambridge: Cambridge Scholars Publishing.
Fry, Caroline
2001 Markedness, faithfulness, vowel quality and syllable structure in
French, Linguistics in Postdam 16, 131.
Gallon, Nichola, John Harris and Heather van der Lely
2007 Non-word repetition: an investigation of phonological complexity in
children with Grammatical SLI. Clinical Linguistics & Phonetics 21,
43555.
Gathercole, Susan and Alan D. Baddeley
1990 The role of phonological memory in vocabulary acquisition: A study
of young children learning new names. British Journal of Psychology
81:4, 439454.
Grard, Christophe.-Loic
1993 LEnfant dysphasique, De Boeck Universit, Bruxelles.
Gierut, Judith, Michle Morrisette, Mary Hughes, and Susan Rowland
1996 Phonological treatment efcacy and developmental norms. Language,
Speech and Hearing Services in Schools 27, 215230.
Harris, John and Edmund Gussmann
1998 Final codas: why the west was wrong. In Eugeniusz Cyran (ed.),
Structure and interpretation in phonology: studies in phonology.
Lublin: Folia, 139162.
Hayes, Bruce and Donca Steriade
2004 Introduction: The Phonetic Basis of Phonological Markedness. In
Bruce Hayes, Robert Kirchner and Donca Steriade (eds), Phoneti-
cally-Based Phonology. Cambridge: Cambridge University Press.
132.
Ingram, David
1981 Procedures for the phonological analysis of childrens language.
Baltimore: University Park Press.
Ingram, David
1989 First Language Acquisition. Cambridge: Cambridge University
Press.
Kaye, Jonathan, Jean Lowenstamm and Jean-Roger Vergnaud

1990 Constituent Structure and Government Phonology. Phonology 7,
193231.
Kaye, Jonathan and Jean Lowenstamm
1984 De la syllabicit. In Franois Dell, Daniel Hirst and Jean-Roger
Vergnaud (eds.), Forme sonore du langage. Paris: Hermann.
Kirk, Cecilia and Katherine Demuth
2005 Asymmetries in the acquisition of word-initial and word-nal con-
sonant clusters. Journal of Child Language 32:4, 709734.
Lee Lauren. L.
1971 Northwestern Syntax Screening Test. Northwestern University Press,
Evanston, IL.
Leonard, Laurence
1998 Children with specic language impairment. Cambridge, MA.: MIT
Press.
Lindblom, Bjorn and Ian Maddieson
1988 Phonetic universals in consonant systems. In Larry Hyman and
Charles N. Li (eds), Language, Speech, and Mind, Routledge: New
York.
Lle, Conxita and Michael Prinz
1996 Consonant clusters in child phonology and the directionality of
syllable structure assignment. Journal of Child Language 23, 3156.
Lowenstamm, Jean
1996 CV as the only syllable type. In Jacques Durand and Bernard Laks
(eds.) Current Trends in Phonology, Models and Methods. Salford:
European Studies Research Institute, 419443.
Lyche, Chantal
1993. Quelques remarques sur le groupe OL en franais. Revue Romane
28:2, 195217.
MacWhinney, Brian
2000 The CHILDES project: Tools for analyzing talk, 3rd edition. Mahwah,
NJ: Lawrence Erlbaum Associates.
Maddieson, Ian
2006 Correlating phonological complexity: data and validation, Linguistic
Typology 10:1, 106123.
Maillart, Christelle and Christophe Parisse
2006 Phonological decits in French speaking children with SLI. Inter-
national Journal of Language and communication disorders 41:3,
253274.
Marshall, Chlo, John Harris and Heather van der Lely
2003 The nature of phonological representations in children with Gram-
matical-Specic Language Impairment (G-SLI). In D. Hall, T.
Markopoulos, A. Salamoura and S. Skoufaki (eds.), Proceedings
of the University of Cambridge First Postgraduate Conference in
Language Research. Cambridge: Cambridge Institute of Language
Research, 511517.
Marshall, Chloe, Susan Ebbels, John Harris and Heather van der Lely
2002 Investigating the impact of prosodic complexity on the speech of
children with Specic Language Impairment. In R. Vermeulen and
A. Neeleman (eds), UCL Working Papers in Linguistics 14, 4366.
Orsolini, Margherita, Enzo Sechi, Cristina Maronato, Elisabetta Bonvino and Alessandra
Corcelli
2001 The nature of phonological delay in children with specic language
impairment. International Journal of Language and Communication
Disorders 1, 6390.
Pellegrino, Franois
2009 De lidentication des langues la complexit phonologique. Habil-
itation diriger des recherches, Sciences du Langage, Universit
Lumire Lyon 2.
Pourin, Delphine
2003 tude phonologique dclarative des semi-voyelles du franais. As-
pects synchroniques et diachroniques, Ph.D. dissertation, University
of Nantes.
Sahlen, Brigitta, Christina Reuterskiold-Wagner, Ulrika Nettelbladt and Karl Radeborg
1999 Non-word repetition in children with language impairment: pitfalls
and possibilities. International Journal of Language and Communi-
cation Disorders 34:3, 337352.
Trubetskoy, Nikolay
1931 Gedanken ber Morphonology. Travaux de Cercle Linguistique de
Prague 4, 5361.
van der Lely, Heather and John Harris
1999 The Test of Phonological Structure. London UK: University College
London. Unpublished test available from the rst author, Centre for
Developmental Language Disorders and Cognitive Neuroscience,
Department of Human Communication Science.
Vihman, Marilyn
1996 Phonological development: the origins of language in the child.
Cambridge: Blackwell.
Winitz, Harris
1969 Articulatory Acquisition and Behavior. New York: Appleton-Century-
Crofts.
Zipf, George Kingsley
1935 The Psycho-Biology of Language: An Introduction to Dynamic Phi-
lology, Cambridge: MIT Press.
Part IV. Assimilation and reduction in connected
speech
Articulatory reduction and assimilation in n#g
sequences in complex words in German1
Pia Bergmann
Abstract
This paper investigates alveolar to velar assimilation in nasal#stop sequences across
phonological word boundaries in complex words in German by means of electro-
palatography (EPG). Independent variables are word frequency, accentuation, and
vowel quantity in the rst part of the complex word. We present evidence for gradient
reduction as well as categorical deletion of the alveolar nasal. Word frequency, vowel
quantity and accentuation inuence articulatory reduction of the alveolar nasal signi-
cantly in particle verbs, while compounds are less affected by the independent variables.
Progressive and conservative speakers were identied with respect to assimilation, as
well as speaker-specic assimilatory strategies.
1. Introduction
This paper deals with the inuence of lexical frequency and prosodic structure
on articulatory reduction of n#g-sequences in complex words in German, e.g.
in words like ein#geben to enter vs. ein#gelen to gel in. The main research
questions are whether morphologically complex high-frequency lexical items
are produced with weaker internal prosodic boundaries than low-frequency
items, and whether accented items are more protected against boundary weaken-
ing than unaccented items. These questions were answered by using acoustic
and articulatory (EPG) methods to investigate six speakers production of test
and control items embedded into carrier sentences. Speaker-specic assimila-
tory strategies will be discussed by presenting the productions of three speakers
1. Thanks to two anonymous reviewers, Phil Hoole, and Peter Auer for many helpful
comments on a previous version of the paper. Furthermore, I want to thank Doris
Mcke, Martine Grice and Marion Jaeger for their help with speech material selec-
tion and data analysis, as well as Raphaela Kirst for labelling most of the data. I
am grateful to Sascha Wolfer for helping me with the statistical analysis.
This study is part of a larger project on frequency effects on assimilation and
other edge-marking phenomena funded by the DFG, Priority Programme 1234:
Edge marking in German compounds: Frequency effects and prosodic constituents
(AU72/181), 20062009.
312 Pia Bergmann
in detail. In this section we will rst introduce the notion of the phonological
word and relate it to the aspects of frequency and prosodic structure. We will
then report current ndings on speaker-specic behaviour in reduction and
assimilation and nally explain the chosen dependent variables.
In the generative framework of prosodic phonology, the phonological or
prosodic word (henceforth pword) is the domain that maps morphological
entities onto phonological/prosodic entities (Nespor & Vogel 22007). A pword
boundary can block the application of phonological rules, for example resylla-
bication as a means of syllable onset maximization. Consider the morpholog-
ically complex word gier + ig greedy, which is syllabied as gie.rig despite
its internal morphological boundary, and thus is considered to constitute one
phonological word: (gie.rig) . In the word lieb + lich mellow, however,
resyllabication across the morphological boundary is blocked (*lie.blich), so
that the string is analyzed as consisting of two separate phonological words:
(lieb) (lich) (cf. Hall 1999; Lhken 1997; Wiese 1996). Likewise, the pword
boundary is relevant for the occurrence of assimilations in n#g-sequences: In
German, velar nasal assimilation is obligatory in word-internal /ng/ or /nk/-
sequences, but it is optional across the boundary of the phonological word,
e.g. [garn] Hungary vs. [n#grn], or [#grn] reluctantly. Although
external factors like speaking style or speech rate may constrain the occurrence
of nasal velarization in these cases (cf. Wiese 1996), this type of assimilation
is considered to be a rule-based process that should apply equally to each
lexical item. This view is challenged by usage-based approaches to language.
These share the assumption that aspects of language use may inuence the
way in which language is produced, perceived and maybe even processed
and mentally stored. Therefore, lexical items with different performance charac-
teristics may be treated differently in a systematic way, and phonological pro-
cesses do not have to apply across-the-board.
One important performance characteristic is lexical frequency, which has
been widely discussed within usage-based approaches. With regard to this
paper, it has been shown that assimilation and reduction of speech sounds are
enhanced by frequency of occurrence (e.g. Bush 2001; Bybee 2001, 2006,
Phillips 2006, Pierrehumbert 2001). Relatively few studies have investigated
frequency effects on segmental reduction from a production point of view by
articulatory methods, however. Jaeger & Hoole (2011) present evidence from
an EMA-study for stronger articulatory reduction of tongue tip movement, i.e.
the alveolar nasal, in /n#k/ sequences, when the rst segment is the right edge
of a high-frequency function word (dann#kann) as compared to a collocation
with a low-frequency content word in rst position (Zahn#kann). They sug-
gest, however, that it is the co-occurrence frequency of the string dann#kann
Articulatory reduction and assimilation in n#g sequences 313
rather than the lexical frequency of the single item dann which is responsible
for assimilation effects. In an EPG-study, Stephenson (2003) reports frequency
effects on alveolar to velar assimilation in stop-stop sequences across a word
boundary in English compounds. Her ndings show that speakers react differ-
ently to high-frequency words. While some speakers reduce the duration of
the segment sequence, other speakers change the place of articulation. Mcke
et al. (2008) found signicant effects of word frequency on n#g-sequences for
only two out of ve speakers. Both speakers make use of temporal variables
(temporal reduction or deletion of alveolar constriction). Kirst (2008), on the
other hand, did not nd signicant effects of word frequency on n#g / n#k-
sequences in her EPG-study of two speakers.2 In the present study, we approach
this question by comparing the production of high-frequency items to that of
low-frequency items.
In addition to frequency, prosodic structure will be taken into account as
an independent variable for assimilation and reduction. Prosodic theories agree
on the fact that the continuous speech stream is subdivided into hierarchically
organized prosodic constituents by means of phonological and/or phonetic
characteristics (cf. Cho et al. 2007; Fougeron & Keating 1997; Keating et al.
2003; Kuzla 2009; Nespor & Vogel 2007). For instance, as mentioned above,
assimilation in German can be blocked by a phonological word boundary. The
lack of a possible assimilation therefore serves as a boundary marker for the
prosodic structure, in this case for the constituent phonological word. Boun-
daries of prosodic domains have received a rather large amount of attention in
prosodic research, showing that domain-initial segments as well as the heads
of domains are articulatorily strengthened (cf. Keating et al. 2003). Domain-
nal elements, on the other hand, tend to be weakened and vulnerable to
assimilations or reductions, which is especially true for coronals (cf. Kohler
1976, 1990). The relation of the ndings to the present study is twofold: First,
we want to test the inuence of accentuation on the occurrence of assimila-
tions and reductions. According to the literature, accented items being the
heads of intonational phrases are expected to be less reduced than unaccented
items. Second, in this study assimilation and reduction of the /n/ in the /n#g/-
sequence will be regarded as a weakening of the pword boundary. Here, the
aspects of prosodic structure and lexical frequency both play a role: Since the
clear separation of the word constituents is supposed to ease lexical access,
2. Both studies overlap with the present study in that the speech material is partly
spoken by the same speakers and partly consists of the same test words. The
speech material differs with respect to prosodic conditions and the segmental con-
texts, which are more restricted in Mcke et al. (2008) and Kirst (2008).
314 Pia Bergmann
boundary weakening would impede the access. As Lindblom (1990) suggests

in his H&H-hypothesis, high lexical frequency favours segmental reduction
because the lexical item is easily accessible for the listener. This assumption
is corroborated by more recent accounts in the area of exemplar theory (cf.
Pierrehumbert 2001). The loss of the internal boundary marking in high-
frequency items may mean that they are produced and perceived as one con-
stituent. High-frequency items and low-frequency items would thus differ in
their prosodic constituency, despite the fact that both are complex from a
strictly morphological point of view. If this is the case, it is not appropriate to
speak of a simple rule-governed mapping of morphological-to-prosodic struc-
ture, because characteristics of specic lexical items may play a role in the
mapping process. A further aspect of prosodic structure relates to the syllable
structure of the rst constituent of the complex words: Acoustic measurements
revealed that most speakers signicantly reduced segment duration if the
segments occurred after a long vowel as compared to a short vowel (e.g.
Bahn train vs. Bann spell) (cf. Bergmann subm.). We therefore chose to
systematically vary vowel quantity in the articulatory study as well. Long
vowel quantity is represented by the diphthong /a/ in this study.
Another aspect that is, in principle, independent of questions of frequency
and prosodic structure concerns inter-speaker variation in articulation (cf.
Hardcastle 1995). As the reported ndings on frequency effects on articulation
above indicate, inter-speaker variation should not be neglected when looking
at factors inuencing articulatory behaviour. Speakers may or may not react to
frequency, and if they do, they may make use of different articulatory variables.
In their study on alveolar to velar assimilation in English, Ellis & Hardcastle
(2002) stress the fact that assimilation may not spring from a single mechanism
common to all speakers of a language instead speakers may select from a
choice of more than one strategy (394). Since the theoretical background of
their study is the question of whether assimilation is categorical or gradient,
they moreover point out that earlier studies on assimilation that described
assimilation as gradient may well have missed speaker-specic categorical
assimilations by calculating means across all subjects. Thus, the question of
whether assimilation is categorical or gradient may nally be a question of
speaker preferences. In the present paper we address this issue by comple-
menting the statistical analysis of the articulatory behaviour of all speakers
with detailed, descriptive analyses of the specic articulatory behaviour of
three selected speakers.
The outline of the paper is as follows: Section 2 introduces materials and
methods. Section 3.1 presents the results of the statistical analysis of the
categorical variable and the gradient durational variables. Section 3.2 contains
the descriptive analyses of the articulatory behaviour of three selected speakers.

In section 4 we will discuss our results and conclusions.
2. Materials and methods
We collected acoustic and articulatory data using electropalatography (EPG

Reading 3) and Articulate Assistant software. Acoustic data were sampled
with a sampling rate of 20 kHz, and EPG-data had a sampling rate of 100 Hz,
i.e. linguo-palatal contact is captured every 10 ms.
2.1. Speech material and speakers

Data were recorded for six native speakers (ve female, one male) of north-
western Standard German at the Institute for Linguistics Phonetics Cologne.
Speakers were between 22 and 38 years old at the time of the recording.
Speakers were nave with respect to the concrete aims of the study.
The stimuli included high-frequency and low-frequency binary noun com-
pounds and particle verbs with an alveolar nasal and a velar stop (voiced or
voiceless) across the internal pword boundary, e.g. ein#geben (to enter). In
addition to frequency and lexical category we varied vowel quantity in the rst
constituent of the word, which was always monosyllabic. We chose front
vowels in both parts of the word in order to prevent further retraction of the
velar segment. The rst vowel in the short-vowel items is always //, and
the second vowel is /e:/ or /i:/. In the long-vowel items, the rst vowel is
the diphthong /a/, and the second vowel /e:/. (We chose the diphthong
/ai/ instead of an /i:/ for practical reasons, because there were neither high-
frequency compounds nor high-frequency particle verbs available with an /:/
in the rst part of the word.) As a control, stimuli were constructed for the
same segmental contexts across a syntactic boundary (C1##C2) and for the
word-nal and word-initial consonants in non-assimilation contexts. The non-
assimilation contexts contain the nasal segment (C1) or the velar segment (C2)
between two vowels (VC1#V, V#C2V and VC1##V, V##C2V). The vowels
are the same as those in the test items in order to control for coarticulatory
effects. The control items across a word boundary (= control_1 items) are
low-frequency items. The items across a syntactic boundary (= control_2 items
and C1##C2-items) are not specied for frequency. In the present study we
will only refer to the test items and control_1 items. The stimuli are displayed
in table 1. The target items are given in German, and English translations
are given in italic letters. HF and LF refer to high frequency and low
frequency. The numbers below the English translations refer to the Google
316 Pia Bergmann
frequency hit at the time of the recording. (The gap in the test item set is
due to the fact that there was no high-frequency compound available for this
context; the vowel length effect can thus only be tested with the particle
verbs.)
Table 1. List of items for n#g and n#k.
Test Items C#C Control_1

Comp Part Comp Part
ein#geben
V: (enter)
2,920,000
HF
Sinn#krise hin#geben
V (existential crisis) (surrender)
99,700 656,000
ein#ehren
ein#gelen
(honour in)
V: (gel in)
frei#kehren
133
(sweep free)
LF
Zinn#igel hin#ekeln
Zinn#krieger hin#gelen
(tin hedgehog) (disgust + prep.)
V (tin warrior) (gel over/back)
Ski#krise frei#ghnen
30 33
(ski crisis) ( yawn free)
All test and control items were embedded into carrier sentences. The sentences
with the test items varied with respect to prosodic structure so that each test
item occurred in accented position as well as in unaccented position. Deaccen-
tuation was achieved by manipulating the information structure of the test
sentences, specically by introducing a negation particle before the test item
(see sentences b and d below). We constructed a context consisting of a ques-
tion-answer pair for the test items and control_1 items. Sentence position of
the test or control item was kept constant so as not to interfere with positional
effects, especially nal lengthening or glottalization at the IP-boundary. All
test items occur as the last syntactic constituent in a prepositioned if-clause
(wenn-Satz), thereby triggering non-nal rising intonation. The items across a
syntactic boundary and control_2 items were embedded as last lexical ele-
ments into questions so that they were also produced with rising intonation.
Three repetitions of each target word were collected. Accentuation was varied
for test items only in order to keep the corpus to a feasible size.3 No dummy
sentences were included. All in all, the sample comprises 81 tokens per
speaker (n = 486). The sentences below exemplify the test sentences for the high-
frequency test item ein#geben, and the low-frequency test item ein#gelen, both
in accented and unaccented position, as well as the test sentences for control_1
items.
EINGEBEN and EINGELEN:
a. HF_acc. [Warum soll ich mir den Zahlenkode merken?]
Wenn du ihn EINGEBEN kannst, geht die Tr automatisch auf.
(Why should I take note of the number code?)
(If you can enter it, the door will open automatically.)
b. HF_unacc. [Warum mssen wir den richtigen Kode zum ffnen der Tr
eintippen?]
Wenn wir ihn NICHT eingeben, geht die Tr nicht auf.
(Why do we have to enter the correct code to open the door?)
(If we dont enter it, the door wont open.)
c. LF_acc. [Warum stylst du deine Haare so auf?]
Wenn man sie EINGELT, hlt die Frisur lnger.
(Why are you styling your hair like that?)
(If you gel it, the hairstyle holds longer.)
d. LF_unacc. [Warum stylst du deine Haare so auf?]
Wenn man sie NICHT eingelt, hlt die Frisur nicht so lange.
(Why are you styling your hair like that?)
(If you dont gel it, the hairstyle wont hold as long.)
e. control 1.1: Wollen wir ihn an dem Abend EINEHREN?
(Shall we honour him in that evening?)
g. control 1.2: Sollen wir den Weg FREIKEHREN?
(Should we sweep the path clear?)
For the recordings speakers were seated in a sound-proof room. The data
were presented visually in random order on a computer screen; the surround-
ing context of the carrier sentence was also presented auditorily. Speakers had
3. The speech material that had to be read out loud contained another subset of test
and control items for t#g/k sequences, which is not part of the present study. Addi-
tionally, the material consisted of items for the segment sequences n#b/p, t#b/p,
#, and s#. These were recorded in separate sessions, though.
318 Pia Bergmann
to read the sentences that were given in bold letters out loud. All speakers had
a short training phase before the recording which contained ten sentences with
the same test design as the test items. This was done primarily to ensure that
the items in the unaccented condition were produced correctly. If a speaker
failed to produce these items correctly, he/she was asked to read the sentence
again and highlight the negation particle.
2.2. Frequency checks

Frequency counts were obtained by means of Google (German pages only:
Seiten auf Deutsch). Words with more than 100,000 hits were categorized as
high-frequency words, and words with less than 1,000 hits as low-frequency
words. (If no item with more than 100,000 hits was available, we sometimes
accepted items with slightly less than 100,000 hits, as is the case for the item
Sinn#krise (= 99,700 hits).)
2.3. Data analysis

We investigated durational reduction of the alveolar constriction and overlap
of the alveolar and velar constriction phases as dependent variables for assimila-
tion and reduction. According to Browman & Goldstein (1992), reduction of
the alveolar gesture and an increase of constriction overlap are indicators
of assimilation. Consequently, we expect increased overlap and decreased
constriction duration for high-frequency words. Additionally, we introduce a
variable velar nasal that relates the release of the alveolar constriction to
the end of the nasal segment in the spectrogram, i.e. an articulatory landmark
is correlated to an acoustic landmark. The reasoning behind this variable is
that if there is a stretch of speech that is identiable as a nasal segment acous-
tically but without an alveolar closure, this stretch can be considered a non-
alveolar nasal. In the case of the investigated n#g-sequences this means that
this portion is assimilated to the velar place of articulation. By means of the
variable velar nasal, we try to compensate for the fact that the movement of
the velum is not available and therefore has to be reconstructed in an indirect
way. In addition to the mentioned gradient durational variables, we use the
deletion of a closure phase in the alveolar region as a categorical variable for
assimilation. This, however, is a rather coarse measurement of assimilation
because it does not specify where in the alveolar region and how many con-
tacts are deleted. We therefore take the deletions mainly as a starting point
for the descriptive analysis of speaker-specic assimilatory strategies.
The data thus were analysed by means of three different types of variables:
One categorical variable, three gradient durational variables, and nally a
descriptive account of speaker-specic articulatory behaviour.
First, we counted the number of deletions of complete closure in the alveolar

region (cf. section 2.4 for labelling criteria). This serves as a categorical mea-
surement indicating whether the air-stream is stopped by an alveolar closure or
not. We did not differentiate whether and how much contact there was in the
alveolar region.
Second, the gradient durational variables include the following measure-
ments:
(1) acoustic variables:
duration of segment sequence (/n#g/, /n#k/) (DS)
(2) articulatory variables (cf. Bombien et al. 2007; Byrd 1996; Byrd et al.
1995):
alveolar constriction duration (ACD)
overlap/lag between alveolar and velar constriction phase (CO)
(3) articulatory + acoustic variable:
only velar constriction in relation to total duration of acoustic nasal
(velar nasal = VN)
Figure 1 exemplies our measurements:
Figure 1. Analysis variables.
Constriction overlap (dotted line) was calculated by subtracting the end of the
alveolar constriction (2) from the start of the velar constriction (1). Overlap
therefore yields negative values, whereas lag between constriction phases yields
positive values. We calculated the ratio of the velar constriction in the acoustic
nasal (broken line) by subtracting the release of the alveolar constriction from
the end of the acoustic nasal and dividing the result by the duration of C1 (((3-
2)*100)/C1). This measurement indicates the amount of velar nasal in C1.
320 Pia Bergmann
The third type of analysis consists of a detailed description of speaker-

specic assimilatory strategies. Each realisation of a test item was inspected
visually for its cumulative contact pattern across the n#g segment sequence
as well as its dynamic changes across the sequence. Details concerning the
interpretation of the cumulative contact patterns will be given in section 3.2.
Figure 2. Third realisation of the test item hin#geben in unaccented position by

speaker KA. (a) Cumulative contact pattern across the n#g sequence,
(b) all palates across the n#g sequence.
Each realisation of a test item is specied for the independent variables

frequency, accentuation and vowel quantity. For instance, in the given
example (g. 2) we have one realisation of a high-frequency particle verb
with a short vowel in unaccented position. In order to systematize the realisa-
tions of the test items for each speaker, we established a reduction scale
which combines the independent variables accentuation and frequency
and is established for every word group separately, i.e. particle verbs with a
long vowel, particle verbs with a short vowel, and compounds with a short
vowel. For example, the reduction scale for particle verbs with a long vowel
starts with the low-frequency word ein#gelen in accented position and ends
with the high-frequency word ein#geben in unaccented position. The high-
frequency word in accented position and the low-frequency word in unaccented
position constitute the in-between cases of the continuum: LF_acc HF_acc
LF_unacc HF_unacc. For each step in the continuum there are three repeti-
tions of each item. In addition to the test items, the control_1 items were
described, so that differences between the same segment in assimilation and
non-assimilation contexts could be taken into account.
2.4. Labelling criteria

Acoustic labelling was done by visual inspection of the spectrogram in Articu-
late Assistant. Articulatory data were labelled manually with the Articulate
Assistant Software. The relevant region for an alveolar segment was dened as
rows 1 to 3, and columns 2 to 7. The velar region was dened as rows 7 to 8,
and columns 3 to 6 (cf. g. 3). An alveolar constriction was labelled for all
frames in which any of the rows in the alveolar area were closed. A velar con-
striction was labelled for all frames in which 80% of the relevant area was
closed. We introduced this threshold because many of the test and velar con-
trol items had no complete closure in the relevant area, presumably due to
retraction of the velar contact beyond the reach of the articial palate.
Figure 3. Selected regions for relevant alveolar and velar contact.
2.5. Hypotheses
Our hypotheses are that high frequency leads to stronger reduction and assimi-
lation than low frequency. Unaccented items will be more reduced and assimi-
lated than accented items, and we suppose the same to hold for long-vowel
items compared to short-vowel items. In detail, we hypothesize that
(1) the number of deletions of the alveolar closure is higher in high-
frequency items, unaccented items, and long-vowel items,
(2a) the gradient durational measurements SD, ACD, and VN decrease with
high-frequency items, unaccented items, and long-vowel items,
(2b) the gradient durational measurement CO increases with high-frequency
items, unaccented items, and long-vowel items,
(3a) speakers will have shorter and less linguo-palatal contact in the alveolar
region when moving from one end of the reduction scale to the other,
(3b) speakers differ with respect to their assimilatory strategies.
322 Pia Bergmann
2.6. Statistical analysis

Chi-Square tests were carried out for the distribution of alveolar closure deletions
for the independent variables frequency, accentuation, vowel quantity,
as well as for each speaker. For the gradient variables ACD, SD, VN and
CO, we did separate statistical analyses for compounds and particle verbs. If
alveolar constriction durations were not available due to deletion, they were
included into the analysis by calculating them as zero (0) duration. For con-
striction overlap, the relevant items were omitted as not available. All values
of the gradient variables were converted to z-scores on a speaker-by-speaker
basis before pooling the data to carry out 3-way ANOVAs (frequency
accentuation vowel length) for the particle verbs, and 2-way ANOVAs
(frequency accentuation) for the compounds. All statistical analyses were
carried out with SPSS.
3. Results
3.1. Statistical analysis
3.1.1. Deletion of the alveolar closure

This section gives an overview of the distribution of alveolar closure deletions
across speakers and the independent variables frequency, accentuation,
and vowel quantity. Out of all realisations of the test items by all speakers
(n = 215), the alveolar constriction is deleted in 74 cases.
All speakers deleted the alveolar closure, but they differed signicantly with
respect to the number of deletions (2(5) = 22.3, p < .001). The number of de-
letions ranged from ve deletions (13.8% of all spoken test items) by speaker
AH to 22 deletions (61.1% of all spoken test items) by speaker DM (g. 4). 71
of the deletions occurred in particle verbs, and only three in noun compounds.
We therefore checked the signicance for the particle verbs only: Vowel quan-
tity had a signicant effect on the occurrence of total assimilations: 51 deletions
occurred after test items with a long vowel, 20 after test items with a short
vowel (2(1) = 26, p < .001; g. 5).
In particle verbs, high-frequency words were produced with 41 deletions,
low-frequency words with 30 deletions, and unaccented items had 41 deletions,
as compared to 30 deletions in accented items. This corresponds to statistical
trends for frequency (2(1) = 3.09, p = .079) as well as accentuation (2(1) = 3.09,
p = .079). Taking the particle verbs with a long vowel and those with a short
vowel separately, we observe signicant differences in particle verbs with a
Figure 4. Deletion of alveolar closure across Figure 5. Deletion of alveolar closure plotted
speakers. against vowel quantity.
short vowel, but not in those with a long vowel: In particle verbs with a short
vowel, deletions of the alveolar closure occur signicantly more often in unac-
cented items (2(1) = 4.15, p < 0.05), and in high-frequency items (2(1) = 6.76,
p < 0.05). Moreover, the number of deletions increases signicantly with the
reduction scale (2(3) = 11.16, p < 0.05). Figures 6 and 7 illustrate the distribu-
tion of the deletions of the alveolar closure according to the reduction scale for
particle verbs with a long vowel (g. 6), and particle verbs with a short vowel
(g. 7). The reduction scale combines the independent variables accentuation
and frequency and ranges from low-frequency words in accented position to
Figure 6. Deletion of alveolar closure plotted Figure 7. Deletion of alveolar closure plotted
against the reduction scale against the reduction scale (V, n = 71).
(V:, n = 72).
324 Pia Bergmann
Figure 8. Interaction of frequency with vowel Figure 9. Interaction of accentuation with vowel
quantity in particle verbs (ACD). quantity in particle verbs (ACD).
high-frequency words in unaccented position (cf. section 2.3 for a more detailed
explanation of the reduction scale).
3.1.2. Gradient reduction and assimilation
The next section deals with the results of the statistical analysis of the gradient
dependent variables alveolar constriction duration (ACD), segment sequence
duration (SD), ratio of velar constriction in [n] (VN) and constriction
overlap (CO).
3.1.2.1. Alveolar constriction duration (ACD)

In particle verbs, all investigated independent variables had signicant effects
on the duration of the alveolar constriction: Alveolar constriction durations are
signicantly shorter in high-frequency items (F(1,135) = 7.67, p < .01), in un-
accented items (F(1,135) = 15.96, p < .001), and in particle verbs with a long
vowel (F(1,135) = 69.71, p < .001). However, we observed signicant interac-
tions for frequency*vowel quantity (F(1,135) = 6.11, p < .05), and accentua-
tion*vowel quantity (F(1,135) = 9.97, p < .01). Both interactions show that the
observed effects of frequency and accentuation are only relevant for particle
verbs with a short vowel (cf. g 8, g. 9). It should be borne in mind that
the average durations include all deleted alveolar constrictions, which were
calculated as 0 ms duration. The very short average durations after a long
vowel (<20 ms) are thus partly due to the high number (n = 51) of deletions
in these test items.
Figure 10. Reduction scales for ACD (error bars: 95% CI).
In compounds, only accentuation had a signicant inuence on the duration

of the alveolar constriction: Durations are signicantly shorter in unaccented
items than in accented items (F(1,68) = 5.9, p < .05). Figure 10 illustrates the
ACDs along the reduction scale for particle verbs with a long vowel, particle
verbs with a short vowel, and compounds with a short vowel.
3.1.2.2. Segment sequence duration (SD)
As for ACD, all independent variables have a signicant inuence on the
duration of the segment sequence in particle verbs: Durations are signicantly
shorter in high-frequency items (F(1,136) = 22.84, p < .001), in unaccented
items (F(1,136) = 27.66, p < .001), and in particle verbs with a long vowel
(F(1,136) = 94.57, p < .001). We observe one signicant interaction between
frequency and vowel quantity; again, the effects of frequency are only relevant
in particle verbs with a short vowel (F(1136) = 4.87, p < .05) (cf. g. 11).
326 Pia Bergmann
Figure 11. Interaction of frequency with vowel quantity in particle verbs (SD).
In compounds, again, only accentuation leads to signicant results: Dura-

tions are signicantly shorter in unaccented items (F(1,68) = 15.82, p < .001).
Results are plotted according to the reduction scales in gure 12.
Figure 12. Reduction scales for SD (error bars: 95% CI).

Figure 13. Reduction scales for VN (particle verbs, error bars: 95% CI).
3.1.2.3. Velar nasal (VN)

In particle verbs, we expect a higher amount of velarization in high-frequency
items, in unaccented items, and in items with a long vowel. Accordingly,
signicant main effects are in the expected direction for frequency (F(1,135) =
3.91, p = .05), accentuation (F(1,135) = 8.01, p < .01), and vowel quantity
(F(1,135) = 36.92, p < .001). No interactions were signicant. There were no
signicant effects for compounds.
Figure 13 displays the percentage of velar nasal according to the reduction
scale for particle verbs with a long vowel and particle verbs with a short
vowel.
3.1.2.4. Constriction overlap (CO)
In particle verbs, constriction overlap yields no signicant main effects.
There is a signicant interaction of vowel quantity with frequency, however
(F(1,62) = 4.77, p < .05): In particle verbs with a short vowel, we nd signi-
cantly more overlap in low-frequency items, whereas in particle verbs with a
long vowel there is more overlap in high-frequency items (cf. g. 14).
In compounds, high-frequency items have signicantly more overlap than
low-frequency items (F(1,44) = 4.77, p < .05). The results for compounds are
plotted against the reduction scale (cf. g. 15).
The results for constriction overlap are quite difcult to interpret. Accord-
ing to our hypotheses, we expect more overlap in high-frequency items. This
expectation is met by compounds and by particle verbs with a long vowel. In
particle verbs with a short vowel, however, there is signicantly less overlap
in high-frequency items. This nding is surprising, because all compounds
328 Pia Bergmann
Figure 14. Interaction of frequency with vowel Figure 15. Reduction scale for CO (compounds,
quantity in particle verbs (CO). error bars: 95% CI).
(Negative values = overlap, positive
values = no overlap).
consist of short-vowel items. Therefore, we would rather expect the com-

pounds to group with the particle verbs with a short vowel. Furthermore, it is
difcult to see why the overlap pattern for frequency should be inverted in
particle verbs with a long vowel as compared to those with a short vowel. It
should be pointed out, however, that the data base for the analysis is strongly
diminished, because the CO can only be calculated when both the alveolar
constriction and the velar constriction are available. If one of both is deleted,
no constriction overlap can occur. Thus, the ANOVA for CO is based on those
test items where neither the alveolar nor the velar constriction is deleted. In
our material, there are 21 velar deletions in compounds and 2 velar deletions
in particle verbs, as well as 3 alveolar deletions in compounds and 71 alveolar
deletions in particle verbs. Consequently, only 48 compounds and 70 particle
verbs remain for the statistical analysis. Moreover, deletions do not pattern
randomly (cf. section 3.1.1) so that this is also mirrored by the available data:
20 long-vowel particle verbs are opposed to 50 short-vowel particle verbs.
3.1.2.5. Summary
To summarize, the independent variable vowel quantity had signicant main
effects on all dependent variables except for CO (constriction overlap). Test
items with short vowels are characterized by longer durations of alveolar
constrictions and segment sequences, as well as lower percentages of velarity
in C1. They additionally have signicantly less deletions of the alveolar con-
striction phase, as compared to the test items with a long vowel.
The independent variable accentuation had signicant effects on ACD

and SD for particle verbs as well as compounds. Durations are shorter in
unaccented items than in accented items. Accentuation, however, interacts
with vowel quantity showing that the effects are only valid for particle verbs
with a short vowel.
The same is true for the effects of frequency on ACD: High-frequency
items have shorter durations of the alveolar constriction, but this effect is
relevant only for particle verbs with a short vowel. Accentuation and fre-
quency both yielded signicant results for VN in particle verbs: Unaccented
items and high-frequency items have higher percentages of velarity in C1.
Frequency has a signicant effect on CO in compounds: High-frequency
items have longer overlap durations than low-frequency items. Obviously, the
investigated compounds are conspicuously less inuenced by the independent
variables than the particle verbs. We thus observe that lexical category plays a
role for reduction and assimilation at the word boundary too.
3.2. Description of speaker-specic behaviour
This section deals with the articulatory behaviour of speakers DM, KR and
KA. The description is based on all realisations of the particle verb test items.
These were chosen for the descriptive analysis because they contain a com-
plete comparable set of particle verbs with a long vowel as well as particle
verbs with a short vowel. Although vowel quantity has a stable signicant
effect on all of the investigated dependent variables except for constriction
overlap, closer inspection of speaker-specic behaviour reveals that speakers
are affected differently by vowel quantity. Speakers DM, KR and KA were
chosen due to their number of deletions of the alveolar closure (cf. section
3.1.1). Speaker DM is the most progressive speaker with respect to the number
of deletions (n = 22), whereas speakers KR and KA both have a similar, mod-
erate amount (KR = 11, KA = 12) compared to the other speakers. Despite their
similar number of deletions, the distribution of deletions patterns differently
across the test items for KR and KA, which will be discussed in detail below.
The description is arranged along the reduction scale. The results for
speaker KA will be presented rst, followed by the results for speaker KR
and speaker DM. For each speaker, we will rst present the realisations of
the particle verbs with a short vowel and then those with a long vowel. The
realisations are comprised in tables that contain the following information:
The columns represent the steps in the reduction scale; rows 1 to 3 repeat the
results for the durational measurements, averaging across all three realisations
of the test item. A refers to the dependent variables alveolar constriction
duration and constriction overlap, B refers to the dependent variable
velar nasal, and C to the dependent variable segment sequence duration.
330 Pia Bergmann
Rows 4 to 6 present the EPG pattern of the linguo-palatal contact in the n#g-
sequence for each realisation of the test item. The EPG pattern thus contains
the contact for the whole alveolar#velar sequence; the numbers given for each
single contact represent the percentage of sustained contact in the sequence.
Row 7 nally demonstrates the EPG patterns for all realisations of the respective
control item ([n]). Since the cumulative pattern refers to the whole segment
sequence, the percentages are not directly comparable to the control items. In
an ideal case, we would expect percentage values for the test items to be
around 50%, whereas those for the control items should be near 100% in the
alveolar region. Rather than taking into account these ideal values, we will
refer to the percentage values only when comparing the different realisations
of the test item. When drawing comparisons with the control items, only the
location of the contact, but not its duration will be taken into account.
Figure 16 presents the short-vowel particle verbs produced by speaker KA:
Figure 16. Short-vowel particle verbs, speaker KA.

KA produced two instances of deletion of the alveolar closure, both in

unaccented high-frequency words. The third realisation of the unaccented
high-frequency word shows less contact in the mid columns in row 3 com-
pared to the control items. Weakening of these contacts can be observed in
the second realisation of the LF_unacc word, too.
With the exception of realisations 2 and 3 of the LF_acc word, all other
realisations have the same or even more contact than the control items.
Remarkably, the linguo-palatal contact is more fronted in these items than in
the control items, where there is hardly any contact in row 1 of the palate.
Realisations 2 and 3 of the LF_acc word, on the other hand, are slightly re-
tracted by omitting most of the contacts in row 2 and all in row 1.
Figure 17 shows the long-vowel particle verbs produced by speaker KA:
Figure 17. Long-vowel particle verbs, speaker KA.

332 Pia Bergmann
Most strikingly, all unaccented items were produced without an alveolar

closure; three out of six realisations have no contact in the four front rows,
and the other three have only one or two lateral contacts in row 4. In accented
high-frequency words we nd one deletion of the alveolar constriction phase.
The other accented items show a weakening of contact in the mid columns in
rows 2 and 3, the only exception being the rst realisation of the accented
low-frequency word in which there is even more contact than in the control
items. Compared to the particle verbs with a short vowel, we thus principally
nd the same contact patterns: strong contact even in row 1; loss of contact in
row 1; weakening of contact in the mid columns of rows 3 and 4; deletion of
alveolar closure beginning with loss of contact in the mid columns; nally,
deletion of contact in the rst four rows. The particle verbs with a long vowel
and those with a short vowel differ with respect to the starting point of these
articulatory reductions, however. While loss of the mid contact occurs in all
conditions after long vowels, this is only the case in unaccented high-
frequency words after a short vowel.
We now turn to the results for speaker KR. As mentioned above, KR and
KA have nearly the same number of alveolar closure deletions. Closer inspec-
tion of the contact patterns however reveals that both speakers behave differ-
ently with respect to the conditions leading to alveolar closure deletion. Figure
18 shows the short-vowel particle verbs produced by speaker KR.
In contrast to KA, speaker KR does not delete the alveolar closure in short-
vowel particle verbs at all. Her realisations of the test items remain remarkably
stable across all conditions. Compared to the control items we nd a weaken-
ing and loss of contact in the second row of the palate, with the weakening
obviously starting from the lateral columns (e.g. realisation 1 of LF_acc, real-
isations 2 and 3 of LF_unacc). All realisations of unaccented high-frequency
words have no contact in row 2, but full contact in row 3. Thus, in her produc-
tion of short-vowel particle verbs, speaker KR is a very conservative speaker
with only a slight retraction of the linguo-palatal contact in unaccented high-
frequency words. Taking into account KRs productions of the long-vowel
particle verbs, a completely different picture emerges (cf. g. 19).
All alveolar closures are deleted, except for one realisation of an unaccented
low-frequency item. There is no contact in the rst two front rows of the palate.
Few realisations have contact in columns 3 and 6 and in rows 3 and 4 (realisa-
tions 2 and 3 of LF-acc); all other items only have contact in the two lateral
rows on each side of the palate no further than row 3. With respect to the
long-vowel particle verbs, speaker KR is therefore a progressive speaker. We
will come back to KRs results when discussing the articulatory behaviour of
DM.
Figure 18. Short-vowel particle verbs, speaker KR.
DM was introduced as a progressive speaker with the highest number of

alveolar closure phase deletions. This is demonstrated by her realisations of
the particle verbs with an initial short vowel (cf. g. 20).
With the exception of two accented low-frequency items, DM deletes the
alveolar closure in all cases. Those items with an alveolar closure show a
remarkable amount of contact from rows 1 to 4 when compared to the control
items. The least amount of contact is observed for the accented high-frequency
words; most unaccented items have a strong lateral contact in columns 1, 2
and 7, 8, and no further than row 2 (realisation 3 of HF_unacc). DMs produc-
tions of the long-vowel particle verbs are very similar to the short-vowel
particle verbs, with the exception of a stronger retraction of the lateral contact
(cf. g. 21).
334 Pia Bergmann
Figure 19. Long-vowel particle verbs, speaker KR.
Still, the amount of lateral contact is remarkable, which is obvious when

comparing DMs deletions with the control items of the velar segment. The
two palates show the velar control items with the least contact (22a) and most
contact (22b). Lateral contact in these items is much less than in the test items,
ranging no further than row 5 (g. 22b). Thus, DMs assimilatory strategy
seems to entail a strengthening of lateral contact, especially in columns 2 and
7, which are neither contacted in the alveolar control items nor in the velar
control items. This is not the case for the comparable long-vowel items pro-
duced by KR, whose velar control items are given in g. 23a, b.
Another difference between DM and KR concerns the dynamic changes of
articulatory contact over time in the segment sequence. Whereas speaker DM
has a very static articulatory pattern across the whole sequence, speaker KR
Figure 20. Short-vowel particle verbs, speaker DM.
has a more dynamic pattern. In order to illustrate this difference we chose to

examine item 0005 for speaker KR (realisation 2 of HF_unacc, V:) and item
0036 for speaker DM (realisation 3 of HF_unacc, V). For both items the
alveolar closure is deleted, but there is contact in columns 1 and 2 and 7 and
8 up to row 2 (for DM) or 3 (for KR). For these items all palates across the
n#g sequence are presented, starting with those of KR (g. 24).
The KR sequence is characterized by a gradient decay of contact in rows 3
(palates 477479) and 4 (palates 480485), until there is no contact in row
4 (palate 489). The last palate nally resembles KRs velar control item
(g. 23a).
336 Pia Bergmann
Figure 21. Long-vowel particle verbs, speaker DM.
Figure 22. Velar controls for DM with least Figure 23. Velar controls for KR with least con-
contact (a) and most contact (b) tact (a) and most (b) contact
DMs sequence differs considerably from that of KR (cf. g. 25): There are
no dynamic changes in the alveolar region with the exception of one contact
(row 2, left column 2) that is deleted in palate 449. Lateral contact up to row 2
is kept constant until the end of the n#g-sequence. This pattern does not
resemble any of the velar control items of speaker DM. (This is not to deny
that some of the realisations of DM resemble her velar control items [cf. the
accented high-frequency long-vowel particle verbs], but most of the realisations
are static compared to the realisations of KR. This can be seen when taking into
account the percentages in the cumulative EPG palates of gures 18 to 21.)
Concluding the comparison between DM and KR we can say that
although both are progressive with respect to deletions (after long vowels at
least) the two speakers have different assimilatory strategies. DM often
produces a segment that resembles neither the typical alveolar nor the velar
segment (as represented by the control items), and sustains this segment
statically. KR on the other hand seems to shift dynamically from more front
contact to back contact. It must be pointed out that DM produces blends
between the alveolar and velar segments as well as segments that are similar
to her velar control items. This indicates that DM applies categorical assimila-
tions (i.e. the velar segment) as well as gradient assimilations.
Comparing KR to KA, we can state that they differ considerably too,
despite their similar number of alveolar closure deletions. KA has comparable
strategies for long-vowel and short-vowel particle verbs. She reduces contact
gradiently (more or less) along the reduction scale. KR on the other hand has
hardly any reductions in short-vowel particle verbs, but many more reductions
and deletions in long-vowel particle verbs, so that she seems conservative in
one part, but progressive in the other.
Figure 24. Item 0005 (ein#geben_unacc), speaker KR.

338 Pia Bergmann
Figure 25. Item 0036 (hin#geben_unacc), speaker DM.
4. Summary and conclusions
The aim of this study was to investigate the inuence of the independent vari-
ables frequency, accentuation and vowel quantity on the occurrence
of reduction and assimilation in n#g sequences in binary complex words in
German. Additionally, we were interested in speaker-specic strategies for
reduction and assimilation.
Our ndings show that most of our hypotheses stated in section 2.5 are
generally conrmed: Statistical analysis for all speakers yielded signicant
results for some of the dependent variables for frequency, accentuation
and vowel quantity in the expected direction, i.e. we encounter more reduc-
tion and assimilation in high-frequency items, unaccented items and items
with a long vowel. With respect to the investigated dependent variables, we
observe that alveolar constriction duration (ACD), segment sequence dura-
tion (SD), and velar nasal (VN) are affected by all independent variables,
whereas constriction overlap (CO) is only affected by frequency. Frequency
effects on CO are hardly straightforward, though. This may partly be due to
the fact that the data base for statistical analysis was strongly diminished due
to a high number of alveolar and velar constriction deletions (cf. 3.1.2.4).
Deletion of the alveolar constriction is affected by vowel quantity, varied
signicantly for speakers, and showed a signicant distribution along the
reduction scale. There are some interesting restrictions to these general nd-
ings, though. First, our results show that lexical category has a major role to
play in reduction and assimilation: Particle verbs are inuenced by all inde-
pendent variables and for all dependent variables except for CO, whereas com-
pounds are more conservative and are signicantly inuenced by accentuation
for the durational variables ACD and SD only, as well as by frequency for
CO. Second, within the particle verbs, the observed effects for accentuation
and frequency are attributable to particle verbs with a short vowel only. Since
particle verbs with a long vowel have signicantly more deletions of the
alveolar constriction and more durational reductions than their short-vowel
counterparts, we hypothesize that the reductions are simply too strong to leave
any room for effects of accentuation and frequency. These can only occur in
short-vowel particle verbs, where the nasal segment is long enough to allow
a range for variation. The stronger reluctance of noun compounds to undergo
reductions and assimilations across the word boundary can be interpreted as an
effect of the type of words that enter the composition: While noun compounds
consist of two content words, particle verbs are composed of a function word
and a content word. The rst part of the complex word may therefore be more
vulnerable to reductional phenomena in particle verbs. Moreover, the investi-
gated particle verbs are more frequent than the noun compound which supports
the higher degree of reduction. In this respect the introduction of the reduction
scale proved to be useful. It enables us to demonstrate the effects of accentua-
tion and frequency for each word group separately and yielded signicant
results for the distribution of deletions of the alveolar constriction in particle
verbs with a short vowel.
To conclude, accentuation has a signicant impact on the articulatory be-
haviour of our subjects, even in noun compounds, thereby corroborating many
similar ndings in the realm of prosodic phonology. Vowel quantity has
the most robust effect on the gradient durational variables as well as on the
categorical variable (cf. Bergmann subm. for similar results on the reduction
of geminates across the pword boundary). The result of the durational variables
can be interpreted as a compensatory shortening or lengthening of the last
segment with respect to the domain of the syllable, which ts well into a
non-hierarchical model of syllable structure (cf. Clements & Keyser 1983).
This cannot explain, however, why speakers consistently delete the alveolar
closure less often in short-vowel items than in long-vowel items, and why
they produce more velarity in the nasal segment after long vowels than after
short vowels. The stronger assimilation of the nasal segment after long vowels
hints at an articulatory reduction that can be better explained in a hierarchical
constituent model of syllable structure, where the segment after a diphthong is
only loosely attached to the coda or the syllable (cf. Lenerz 2002). The differ-
ent articulatory treatment of the nasal segment can therefore be explained by
their position in the constituent model. Our results are inconclusive, however,
as to whether a constituent model like that of Lenerz (2002) or a syllable
cut model of syllable structure should be preferred. Both assume that the
340 Pia Bergmann
consonant after a short vowel has closer contact to the vowel, and is more
strongly integrated into the syllable, than a consonant that follows a long
vowel, or diphthong (cf. Auer, Gilles & Spiekermann 2002; Becker 1998;
Hoole & Mooshammer 2002; Lenerz 2002). Another crucial difference be-
tween the long-vowel and short-vowel items concerns their segmental struc-
ture: The former consist of an // as a nuclear vowel, while the latter consist
of an /a/. The open vowel of the diphthong may enhance a more open and
retracted production of the whole syllable, so that alveolar closure is omitted
more often in long-vowel items. Thus, the independent variable vowel quantity
is confounded with segmental structure, which could explain part of the strong
effect on the categorical variable. It should be mentioned, too, that vowel
quantity is not only confounded with segmental structure, but possibly also
with frequency (see below). With respect to frequency, the hypothesis grounded
on usage-based theories and exemplar theoretical work that high-frequency
items are produced with more reduction is conrmed for the particle verbs,
especially those with a long vowel. It should be noted that vowel quantity is
correlated with extremely high frequency in our study. Thus, the strong reduc-
tions and assimilations in particle verbs with a long vowel may not only be
attributable to syllable structure, as explained above, but may be additionally
enhanced by extremely high absolute frequency (2,920,000 hits for ein#geben
vs. 340,000 hits for hin#geben). Our results suggest that future work on fre-
quency effects on reductional phenomena should crucially take into account
vowel length and syllable structure. It moreover would be worthwhile to dis-
entangle vowel quantity from its confounding factors (1) by comparing short-
vowel monophthongs with same long-vowel monophthongs, e.g. [a] vs. [a:],
and (2) by systematically comparing different long-vowel items of varying
degrees of high-frequency.
The descriptive analysis of speaker-specic behaviour focussed on three
female speakers who were selected based on their number of alveolar closure
deletions. Speaker-specic differences were demonstrated for KA and KR,
who reacted to different dependent variables, and had systematically different
contact patterns along the reduction scale. The comparison between speakers
KR and DM yielded different assimilatory strategies in their realisations with
deleted alveolar closures: While KR shifted dynamically from more front to
back contact, DMs articulation of the segment sequence was static with
many items that blend the alveolar and the velar control segment, and some
items that resemble the velar control segment. This means that DM assimilates
categorically in some cases, but gradiently in others. Our ndings therefore
corroborate Hardcastle (1995) and Ellis & Hardcastles (2002) claim that
inter-speaker variation as well as intra-speaker variation should not be neglected
in the study of assimilation. Speakers may apply different strategies (KR and
DM), and, as in the case of DM, they may switch between categorical assimi-
lation and gradient assimilation. The realisations of DM, however, do not
follow the reduction scale, i.e. we do not nd categorical assimilation in un-
accented high-frequency items as opposed to the gradient assimilations in the
other conditions. We therefore cannot explain under which conditions speaker
DM would produce gradient assimilations or categorical assimilations.
In the present study, we were able to show that assimilation and reduction
across word boundary is inuenced by syllable structure, prosodic structure
(accentuation), frequency, and lexical class. Moreover, speaker-specic prefer-
ences for gradient and/or categorical assimilations were demonstrated.
References
Auer, Peter, Peter Gilles and Helmut Spiekermann (eds.)

2002 Silbenschnitt und Tonakzente. Tbingen: Max Niemeyer Verlag.
Becker, Thomas
1998 Das Vokalsystem der deutschen Standardsprache. Frankfurt: Peter
Lang.
Bergmann, Pia
subm. Reduction and deletion of glottal stops and geminates at pword-
boundaries in German Effects of word frequency and accentuation.
Bombien, Lasse, Christine Mooshammer, Philip Hoole, Tamara Rathcke and Barbara
Khnert
2007 Articulatory strengthening in initial German /kl/ clusters under pro-
sodic variation. Proceedings of the XVIth International Congress of
Phonetic Sciences, Saarbrcken: 457460.
1992 Articulatory phonology: An overview. Phonetica 49: 155180.
Bush, Nathan
2001 Frequency effects and word-boundary palatalization in English. In:
Joan Bybee and Paul Hopper (eds.): Frequency and the Emergence
of Linguistic Structure, 255280. (Typological Studies in Language
45.) Amsterdam: John Benjamins.
Bybee, Joan
2001 Phonology and Language Use. Cambridge: Cambridge University
Press.
Bybee, Joan
2006 From usage to grammar: The minds response to repetition. Lan-
guage 82: 711733.
342 Pia Bergmann
Byrd, Dani, Edward Flemming, Carl A. Mueller and Cheng C. Tan

1995 Using regions and indices in EPG data reduction. Journal of Speech
and Hearing Research 38: 821827.
Byrd, Dani
1996 Inuences on articulatory timing in consonant sequences. Journal of
Phonetics 24: 209244.
Cho, Taehong, James M. McQueen and Ethan A. Cox
2007 Prosodically driven phonetic detail in speech processing: The case
of domain-inital strengthening in English. Journal of Phonetics 35:
210243.
Clements, George N. and Samuel J. Keyser
1983 CV Phonology. A Generative Theory of the Syllable. Cambridge:
MIT Press.
Ellis, Lucy, and William J. Hardcastle
2002 Categorical and gradient properties of assimilation in alveolar to
velar sequences: evidence from EPG and EMA data. Journal of
Fougeron, Ccile and Patricia A. Keating
1997 Articulatory strengthening at edges of prosodic domains. Journal of
the Acoustical Society of America 101/6: 37283740.
Hall, Tracy A.
1999 The phonological word: A review. In: Tracy A. Hall and Ursula
Kleinhenz (eds.): Studies on the Phonological Word, 122. Philadel-
phia: John Benjamins.
Hardcastle, William J.
1995 Assimilations of alveolar stops and nasals in connected speech. In:
Jack Windsor Lewis (ed.): Studies in General and English Phonetics,
4967. London/New York: Routledge.
Hoole, Philip and Christine Mooshammer
2002 Articulatory analysis of the German vowel system. In: Peter Auer,
Peter Gilles and Helmut Spiekermann (eds.): Silbenschnitt und
Tonakzente, 129152. Tbingen: Max Niemeyer Verlag.
Jaeger, Marion and Philip Hoole
2011 Articulatory factors inuencing regressive place assimilation across
word-boundaries in German. Journal of Phonetics 29: 413428.
Keating, Patricia A., Taehong Cho, Ccile Fougeron and Chae-Shune Hsu
2003 Domain-initial strengthening in four languages. In: John Local,
Richard Ogden and Rosalind Temple (eds.): Papers in Laboratory
Phonology 6: Phonetic Interpretations, 145163. Cambridge: Cam-
bridge University Press.
Kirst, Raphaela
2008 Einuss von Wortfrequenz und Informationsstruktur auf die Assimi-
lation in alveolar-velaren Sequenzen eine elektropalatographische
Studie. Unpublished Master Thesis, University of Cologne.
Kohler, Klaus J.
1976 Die Instabilitt wortnaler Alveolarplosive im Deutschen: eine
elektropalatographische Untersuchung. Phonetica 33: 130.
Kohler, Klaus J.
1990 Segmental reduction in connected speech in German: phonological
facts and phonetic explanation. In: William J. Hardcastle and Alain
Marchal (eds.): Speech Production and Speech Modelling, 6992.
Dordrecht: Kluwer.
Kuzla, Claudia
2009 Prosodic Structure in Speech Production and Perception. Wageningen:
Ponsen & Looijen.
Labov, William
1972 Sociolinguistic Patterns. Philadelphia: University of Pennsylvania
Press.
Lenerz, Jrgen
2002 Silbenstruktur und Silbenschnitt. In: Peter Auer, Peter Gilles and
Helmut Spiekermann (eds.): Silbenschnitt und Tonakzente, 6786.
Tbingen: Max Niemeyer Verlag.
Lindblom, Bjrn
1990 Explaining phonetic variation: a sketch of the H&H theory. In:
William J. Hardcastle and Alain Marchal (eds.): Speech Production
and Speech Modelling, 403439. Dordrecht: Kluwer.
Lhken, Silvia C.
1997 Deutsche Wortprosodie. Abschwchungs- und Tilgungsvorgnge.
Tbingen: Stauffenburg Verlag.
Mcke, Doris, Martine Grice and Raphaela Kirst
2008 Prosodic and lexical effects on German place assimilation. 8th Inter-
national Seminar on Speech Production, 812 Dezember 2008,
Strasbourg.
Nespor, Marina and Irene Vogel
22007 Prosodic Phonology. Berlin: de Gruyter.
Phillips, Betty S.
2006 Word Frequency and Lexical Diffusion. Houndmills/Basingstoke/
Hampshire/New York: Palgrave Macmillan.
2001 Exemplar dynamics: word frequency, lenition and contrast. In: Joan
Bybee and Paul Hopper (eds.): Frequency and the Emergence of
Linguistic Structure, 137157. (Typological Studies in Language
45.) Amsterdam: John Benjamins.
Stephenson, Lisa
2003 An EPG study of repetition and lexical frequency effects in alveolar
to velar assimilation. Proceedings of the 15th International Con-
ference of Phonetic Sciences, Barcelona: 18911894.
Wiese, Richard
1996 The Phonology of German. Oxford: Clarendon Press.
Overlap-driven consequences of Nasal place
assimilation
Claire Halpert
Abstract
This paper argues that nasal place assimilation in Zulu, and more widely in Bantu,
involves temporal sliding, without temporal extension, of a trigger consonants place
gesture onto the nasal target. This sliding, necessitated by a *Long constraint against
durational increase of gestures, combined with maintenance of timing relationships
between the all gestures of the trigger C, enforced by Alignment constraints, forces
non-place gestures of C to overlap N. In cases where such an overlap would be phonet-
ically marked, Zulu violates Faithfulness to the problematic C gestures, yielding
unfaithful trigger consonant outputs including loss of laryngeal features and affrication.
Such effects do not occur in unassimilated NC clusters. A pilot study indicates that
assimilated NC in Zulu has durational consequences, in line with the analysis proposed
here. A survey of nasal place assimilation effects across the Bantu languages suggests
that this analysis can be made to account for the broader typology of NC in the
language family.
1. Introduction
Articulatory Phonology standardly views place assimilation as a result of

gestural overlap between segments. However, there are multiple views of
how overlap actually brings about the perceived assimilation. Browman and
Goldstein (1989, 1990) suggest that assimilation occurs not as a result of
gestural loss, but simply from the overlap itself, in which one articulatory
gesture is submerged under another. In contrast, Jun (1995, 1996) argues that
gestural overlap in itself does not lead to perceived assimilation; rather, over-
lapped segments are heard as assimilated only when there is gestural reduction
(partial or complete) of the target segment paired with temporal extension
of gestures of the trigger segment. Other work argues that both overlap and
gestural reduction can be responsible for place assimilation (Browman and
Goldstein 1992a,b; Chen 2003; Son et al 2007); Browman and Goldstein
(1992b), for example, suggest that assimilation can result from a combination
of either temporal sliding or temporal extension of a unitary gesture
and reduction of another, now overlapped, gesture. Jun (2004) formalizes a
gesturally-based place assimilation account using Optimality Theory (hence-
346 Claire Halpert
forth OT). He proposes that the gestural reductions resulting in assimilation

occur in order to satisfy an articulatory markedness constraint, Weakening,
which militates against articulatory effort and can lead to the elimination or
reduction of gestures in overlapped structures. A family of perceptual faithful-
ness constraints (Preserve) compete with Weakening to preserve certain
gestural cues to input features. When Weakening outranks a Preserve con-
straint, the relevant gesture is reduced or eliminated, yielding assimilation.
That this process tends to take place in environments of overlap is a result of
perceptual cues being naturally diminished in such circumstances.
There is additional evidence, independent of assimilation, that overlap
between segments can lead to gestural reduction due to such articulatory
markedness considerations. For example, Chitoran et al. (2002) show that in
Georgian harmonic clusters, in which a high degree of overlap between seg-
ments occurs, only a single glottal gesture is licensed for the whole cluster. In
contrast, nonharmonic clusters, which exhibit less overlap between segments,
allow multiple glottal gestures to occur. While the authors do not specically
analyze this pattern as a markedness effect, it has a similar avor to the articu-
latory markedness account of place assimilation, where gestural reduction
occurs in naturally cue-compromising overlap environments. In these gestural-
overlap driven accounts of place assimilation, only oral closure gestures are
directly implicated in assimilation: the oral closure gestures determining place
of the trigger are required to overlap gestures of the target to create the assimi-
lated output. Such accounts do not implicate non-oral closure gestures in the
overlap that yields place assimilation.
Nasal place assimilation in Zulu (Bantu), however, exhibits consequences
on the trigger segment that are independent of the oral closure gesture (Doke
1969, Herbert 1986, Maddieson and Ladefoged 1993), such as loss of laryngeal
features. In (1), the glottal lowering gesture of the underlying implosive // in
the Zulu noun stems is lost just in case the preceding nasal undergoes place
assimilation:
(1) Loss of implosion

a. iN + ali imbali ower
b. iN + ongi imbongi praiser
c. iN + onisi imbonisi spy
In an articulatory framework, this loss of implosion can be represented as in
Figure 1.
Overlap-driven consequences of Nasal place assimilation 347
Figure 1. Loss of Implosion
On a view of place assimilation where only the oral constriction gestures

for the trigger are required to overlap the target, the reduction of other of the
triggers gestures in Zulu is unexpected; rather, any markedness-driven gestural
reduction that arises from gestural overlap should only occur in the target (Jun
2004). To account for the various effects observed on the trigger segment in
Zulu nasal place assimilation, I propose to extend these essential concepts
about place assimilation and markedness-driven gestural reduction in over-
lapped sequences. In essence, this paper explores the notion of place assimilation
as temporal sliding (rather than extension) of the trigger gesture to overlap
the target segment to analyze Zulu (Browman and Goldstein 1992b). I will
employ a *Long constraint that militates against the temporal extension of
gestures and argue that a result of its presence is a greater degree of overlap
between all of the gestures of the trigger C with N in assimilating NC sequences
in Zulu than would otherwise occur. The resulting overlap of all gestures incurs
markedness violations based on the production and perception of the nasal
segment, the resolution of which can affect various gestures of the trigger. In
addition to accounting for the effects seen in Zulu NC assimilating sequences,
this formulation of nasal place assimilation and set of constraints sheds light
on a whole range of nasal place assimilation phenomena found across Bantu
languages.
In section 2, I overview the phenomenon of NC assimilation at morpheme
boundaries in Zulu and the effects exhibited on C in these contexts. I also
discuss evidence from stem-internal NC sequences that suggests that the
processes at work in nasal assimilation at morpheme boundaries seem to hold
more broadly in Zulu as well. In section 3, I develop an optimality theoretic
analysis that captures the observed effects in terms of the strategies used to
avoid overlap-driven markedness violations. In section 4, I present phonetic
evidence in favor of the *Long constraint. In section 5, discuss the typology
of nasal place assimilation in Bantu and argue that type and range of phenomena
exhibited by Bantu languages both support my analysis of Zulu and can be
accounted for easily using my system. Section 6 contains conclusions.
348 Claire Halpert
2. NC and Place Assimilation in Zulu
Nasal place assimilation is a common phenomenon throughout Bantu, typi-

cally occurring at a morpheme boundary when a nasal segment from a noun
class marker or agreement morpheme abuts the initial consonant of a noun
or verb stem (Meinhof 1932, Kerremans 1980, Herbert 1986). Zulu exhibits
some of the canonical effects of Bantu nasal place assimilation, such as
postnasal hardening, as well as some more novel ones, such as postnasal de-
aspiration (Doke 1969). In this section, I will give an overview of the behavior
of NC sequences in Zulu, both in assimilating and non-assimilating contexts at
morpheme boundaries as well as in stem-internal position. As the Zulu data
will show, the effects on the trigger associated with place assimilation are
clearly the result of the place assimilation, and not simply an aspect of NC
phonology in Zulu.
2.1. Background
Zulu has the following consonant inventory:
Table 1. Zulu consonants
post- labial-
labial alveolar palatal velar glottal
veolar velar
p ph t t h k k kh
plosive
b d
nasal m n
sz
fricative fv h

ts dz
affricate t d
t d
approx. l j w
dental lateral alveolar

click consonants
| || !
aspirated |h || h !h
voiced | || !
nasalized | || !
Syllables in Zulu are typically open, glide insertion occurs to prevent hiatus,
and the only consonant sequences found are NC sequences (Doke 1969).1 NC
sequences, homorganic and heterorganic, arise both stem-internally and at
morpheme boundaries. I list in Table 2 the attested stem-internal sequences
found in Zulu.2
Table 2. Stem-internal Clusters
Homorganic Clusters
mp nt k
mb nd g
mpf mbv nts nt ndz
nt nd d
mm mw j
Heterorganic Clusters
mt mt h mk mkh tw kw
md dw gw
ms m mt sw w tw
mz m m zw w w
mn m mj m nw w
What is striking about this distribution is that /m/ is the only nasal to
appear in heterorganic nasal-obstruent sequences; all other nasals only appear
with following homorganic consonants or a labial glide. As we will see, this
stem-internal NC distribution mirrors what we nd in NC sequences occurring
at morpheme boundaries.
1. Whether homorganic NC sequences of this sort should be analyzed as consonant

clusters or single prenasalized segments has been debated in the literature. While
such Bantu clusters have often been analyzed as single segments (cf. Herbert
1986), phonetic evidence shows little to no difference in duration, timing, and
syllabication between NC sequences typically analyzed as single segments and
those considered to be a two-segment cluster (Browman and Goldstein 1986;
Ladefoged and Maddieson 1986; Downing 2005). For the purposes of this analysis,
the status of these NC sequences is perhaps not crucial; for ease of comparison, I
have included all NC sequences, both homorganic and heterorganic in a single
table of consonant clusters, and will set aside the issues of whether some of the
sequences belong in the consonant inventory and whether there exists two types of
homorganic NC in Zuluprenasalized single segments and homorganic NC clusters.
2. Based on a survey of stem initial clusters found in Doke et al. 1990. A further
search of sequences inside stems is needed to ensure that there are no gaps.
350 Claire Halpert
2.2. NC at morpheme boundaries

Due to the V or CV nature of most Zulu syllables, nearly all morpheme boun-
daries are CV, VC, or VV. However, four noun class prexes end in a nasal
that creates an NC sequence when prexed to C-initial stems. Two of the
noun class prexes end in a nasal that undergoes place assimilation with the
initial consonant of the noun root, class 9 iN- prexes (singular) and class 10
iziN- prexes (plural). In contrast, classes 1 and 3 (both singular), which are
both marked by an um- prex, do not undergo nasal place assimilation in Zulu:
(2) Classes 9 & 10
iN/iziN + . . .
a. izi-keke lopsided objects
b. izin-twa chasms
c. izim-poto enemas
d. izi-yememe stampedes
(3) Classes 1 & 3
um+ . . .
a. um-kikilizo insinuation
b. um-tekete weakling
c. um-paso a slap
d. um-yeni husband
Assessing the identity of the underlying nasal in the assimilating prexes is
not straightforward. Elsewhere in Bantu, the underlying nasal in assimilating
prexes is described as /n/ or // based on patterns found with vowel-initial
stems. In Zulu, vowel initial stems are rare. In at least some of the vowel-inital
cases, N in iN/iziN disappears altogether, in contrast to the nasal in the um-
prex3:
(4) Vowel-initial noun stems
a. iziN + -aluko iz-aluko grass mats
b. iN + -anukano i-yanukano quarrel
iziN + -anukano iz-anukano quarrels
c. um + -abi um-abi executor
3. There is historical evidence that other roots in Zulu, including oka snake and
oni bird came from vowel-initial Bantu stems (-oka and -oni), with arising
through place assimilation. Currently, Zulu speakers seem to interpret these stems
as being -initial (Doke et al. 1990).
For the purposes of this paper, I will remain agnostic about the underlying
identity of the assimilating nasal (but see Padgett 1995 for an argument that
the assimilating nasal is underlyingly velar). For my analysis, it is sufcient
to note that the behavior of the assimilating nasal contrasts with the behavior
of /m/ in the um- prex.
In addition to the occurrence of place assimilation of N in class 9/10
iN/iziN, the trigger segment, C, of NC sequences at the morpheme boundary
exhibits a number of additional changes, rst noted by Doke (1969). These
changes are all absent in unassimilating mC sequences. Most of these effects
fall into two categories: loss of laryngeal features and postnasal hardening.4
2.2.1. Loss of laryngeal features

While Zulu contrasts implosive // with non-implosive /b/, this contrast is lost
after an assimilating nasal: underlying // becomes [b]. After um-, the contrast
is maintained:
(5) a. iN + ali imbali truth cl. 9
b. um + ali umali color cl. 3
Additionally, underlyingly aspirated consonants become unaspirated after an
assimilating nasal, but not after um-:
(6) a. iN + tando intando free will cl. 9
b. um + tando umtando love, desire cl. 3
That these aspirated consonants become ejectives, and not plain stops, is an
issue I will address in the analysis in the following sections. Underlyingly
ejective consonants, as seen in (2), undergo no laryngeal changes.
2.2.2. Postnasal hardening

Postnasal affrication, a common cross-linguistic hardening effect (cf. Padgett
1994), is found in assimilating Zulu NC sequences, but not unassimilated mC:
(7) a. iN + sindiso intsindiso salvation
b. iziN + agane izintagane wanderers
c. iN + fene impfene baboon
d. um + ikisho umikisho friction
4. One effect, postnasal voicing of non-nasal clicks, doesnt t clearly either category. I
will set this effect aside for the analysis here. While it will not be addressed in this
work, one potential way to analyze the voicing of the non-nasal clicks in assimilat-
ing NC that is compatible with the analysis developed here is that the voicing is
necessary to be faithful to the nasal-oral sequence of the NC.
352 Claire Halpert
While I will focus in my analysis on the affrication-type hardening shown

above in (7), two additional effects that can be considered postnasal hardening
also occur. In the rst, /l/ following an assimilating nasal hardens to /d/:5
(8) a. iN + -lima indima plot of land
b. um + limi umlimi farmer
Finally, when the assimilating nasal prexes combine with h-initial stems, /h/
hardens to /k/:
(9) a. iziN + -hambo izikambo journeys
b. um + hambi umhambi traveler
2.3. Summary
Zulu exhibits both assimilated and non-assimilated NC sequences in derived
and underlying environments. In both cases, only mC sequences may be
heterorganic; all other sequences must be homorganic. The distribution of
derived and underlying NC sequences suggests that while /m/ in Zulu does
not undergo place assimilation, all other nasals do. The effects on C in de-
rived homorganic NC sequenes, shown in (5)(8), mirror the distribution
of homorganic NC sequences in stem internal position shown in Table 2.
The absence of such effects in the heterorganic mC sequences and derived
homorganic mC sequences (from classes 1 & 3), particularly in contrast to
their presence in assimilated mC sequences resulting from classes 9 and 10,
indicates that these effects result directly from place assimilation.
3. Analysis
Evidence from derived and underlying NC sequences indicates that all nasals
except for /m/ undergo place assimilation in all NC contexts in Zulu. Since
assimilation can result from weakened perceptual cues for place of the rst
segment in a sequence, the resistance of /m/ to assimilation is perhaps due to
its greater internal perceptual cues than other nasals (Silverman 1997, Jun
5. Underlying N+l sequences are rather rare. However, I would like to note that
in addition to l d in these circumstances, there is also an observed pattern of
N (deletion of the assimilating nasal before l). The existence of both pro-
cesses is perhaps due to the low frequency of N+l sequences. As we will see in
section 5, deletion of N in markedness-creating NC contexts is a common pattern
cross-Bantu.
2004). To model the general nasal place assimilation effect in OT, I will use
the following constraints:
(10) a. Assimilate (Assim): Adjacent distinct oral constrictions are
disallowed.
b. Max(constr) /____ vocoid 6: An oral constriction gesture of a
segment in pre-vocoid position in the input must have a corre-
sponding gesture in the output.
c. Max(labial): A labial constriction gesture in the input must have
a corresponding gesture in the output.
By ranking the two faithfulness constraints, (10b) & (10c) above assimila-
tion, we ensure that the place of the second segment is always preserved in
sequences, and thus drives the assimilation, and that /m/ never undergoes
assimilation. These constraints capture the basic distribution of nasal place
assimilation in Zulu and ensure that there is a single oral constriction gesture,
overlapping both segments, in the assimilated sequences. The representation
of an assimilated sequence as an overlapped structure is in line with Browman
and Goldstein (1989) and Jun (1995, 1996) representations of assimilation as
cases of gestural overlap. In these models, however, overlap does not entail
loss of gestures in the output. Rather, assimilated gestures can be present in
reduced, submerged, or blended form in the output. In principle, this analysis
of Zulu place assimilation could be couched in terms of gestural reduction, but
in the absence of articulatory evidence I have chosen the representation here to
reect both the categorical nature of the phenomenon and the lack of any other
cluster or geminate sequences elsewhere in the language.
With this basic mechanism, only the oral closure gesture is implicated in
the assimilation, which does not allow us to account for the effects on C2
described in the previous section. In this section, I will argue that gestural
reduction to avoid markedness is responsible for the additional effects seen in
Zulu. I will employ two additional constraints to enforce the overlap of other
gestures of C2 with N. In the following sections, I will give additional motiva-
tion for adopting these constraints in Zulu.
The rst constraint I will call *Long:
(11) *Long: The duration of an oral constriction gesture must not exceed
the target duration for that gesture.
6. Vocoids include all vowels and glides; Cj sequences are ruled out by an additional
markedness constraint.
354 Claire Halpert
The *Long constraints reliance on target duration follows from the notion
of intrinsic duration of Articulatory Phonology, which assumes that gestures
have an intrinsic temporal component, which varies across speech rates
(Browman and Goldstein 1989, Saltzman and Munhall 1989).7 The *Long
constraint is calculated against the independently derived target duration,
accruing a violation when the target is exceeded.
The result of *Long is to prevent the oral closure gesture of the consonant
from simply lengthening in order to overlap the nasal; rather, to satisfy *Long
the gesture must actually shift in order to create overlap, along the lines of
Browman and Goldstein (1992b), essentially forcing the duration of an assimi-
lated NC sequence to match the duration of C2 alone:8
Figure 2. *LONG violation
In Figure 2, the intrinsic lip closure gesture of C does not increase in

duration in the output; rather, it shifts temporally to partially overlap the velic
gesture for the nasal. To achieve the target duration for the closure gesture, the
N and C portions of the NC sequences sharing a single closure gesture will
by necessity be shorter than an N or a C with a closure gesture of the same
intrinsic duration. I have represented this by aligning the edges of the velic
gestures with the edges of the lip closure gesture (though in reality, the velum
will remain raised into the vowel on the C side, and I have no evidence to bear
7. I will not present a mechanism for calculating target durations here; one possible
way to do so would be to require a xed ratio for the duration of the oral closure
gesture to the preceding vowel (cf. Port and Dalby 1982; thanks to a reviewer for
this suggestion), though such an account would need to be constructed carefully to
avoid making the prediction that lengthening the vowel would allow for a longer
oral constriction to satisfy the constraint.
8. Zulu is a language without geminates (Doke 1969). A language with geminates
would presumably not have a high-ranked *Long constraint, and we would not
expect to nd Zulu-type patterns.
on whether the velum-lowering gesture on the N side shortens to coincide

perfectly with the lip closure or not). An alternative, illustrated on the far right,
is the closure gesture extending in duration to overlap the N, resulting in an
output that exceeds the target duration of the oral closure gesture and violates
*Long. On the bottom tier, the glottis-lowering gesture for the implosive
is lost in the output, as in Figure 2, for markedness reasons discussed in the
following paragraphs. I have represented this loss graphically as a delinking
of the the gesture to mirror the perceptual output; it may be the case, though,
that the gesture is simply reduced, hidden, or blended rather then lost.
In addition to *Long, I will rely on the AP notion of gestural alignment,
which relates the gestures of a segment to each other by specifying the specic
way in which they align with each other. I employ a family of alignment con-
straints (cf. Gafos 2002) that force the timing relationships between gestures in
a segment to remain constant with respect to each other:
(12) Align(G1, landmark1, G2, landmark2): Align landmark1 of Gesture1
with landmark2 of Gesture2 (Gafos 2002).
In a system where both *Long and Align are high-ranked, non-oral
gestures of C2 that are aligned with the oral constriction gesture of C2 are
forced into closer quarters, and possible overlap, with N. In such a system,
there is no opportunity for place assimilation to occur without affecting all
gestures of C2: because the oral constriction gesture of C2 must shift (and not
lengthen) to overlap with N, all gestures in alignment with the oral constric-
tion gesture must also shift. This forced overlap between N and other gestures
of C2 has the potential to create marked structures. I analyze all changes
observed on C2 in NC place assimilation in Zulu as a necessary outcome to
avoid overlap-induced marked structures.
Before I turn to cases of postnasal hardening and loss of laryngeal features
discussed in the previous section, I will note that in all of the cases in Zulu, the
identity of the nasal-oral sequence is maintained; neither the nasality of N nor
the orality of C2 is lost. Thus I will assume that faithfulness to velic gestures is
undominated in the basic system:
(13) Max(velic): A velic gesture in the input must have an identical gesture
in the output.
3.1. Postnasal hardening

As seen in (7), fricatives in assimilating NC environments surface as affricates.
Several accounts of postnasal hardening appeal to the inability of place to
spread separately from stricture (c.f. Padgett 1994). While in frameworks such
356 Claire Halpert
as that of Padgett (1994), the fact that place and stricture must spread as a unit
is stipulated, in Articulatory Phonology it follows directly: place and stricture
are merely two components of an oral closure gesture and thus will always
function as a single unit.
For the oral constriction gesture of C to overlap the nasal in a nasal-fricative
sequence then, the result would be a nasal portion of NC with a critical closure;
in other words, a nasalized fricative. Nasal fricatives are highly marked (Cohn
1993) and do not occur in Zulu, or perhaps any Bantu language (Doke 1969,
Herbert 1986, Ladefoged and Maddieson 1993). A high-ranked markedness
constraint prohibits such segments from occurring in Zulu:
(14) *s: nasalized fricatives are prohibited.
To prevent *s violations, the closure of the oral constriction gesture in a
fricative must change from critical to closed in order to overlap N. At the
same time, violations of the faithfulness constraint Max(constr), given in
(10b), must also be avoided. I will adopt a decomposed representation of oral
constriction gestures in Articulatory Phonology as comprising closure and
release portions (see Steriade 1993 for a discussion of affricates and Nam
2007a,b for a gestural representation of stops as containing separate closure
and release). On this view, an affricate can be represented with a single oral
closure gesture whose stricture changes from closed to critical at the point of
release, graphically represented below in Figure 3:
Figure 3. Affricate
With an underlying fricative segment, changing the closure portion of the

oral constriction gesture from critical to closed allows the gesture to success-
fully overlap the nasal, while leaving the release critical satises faithfulness
requirements to the underlying fricative, thus not incurring violations of Max
(constr). A low-ranked Dep(closed) is violated to insert the closed portion of
the affricate.
(15) Assim, Max(vel), Max(constr), *s X Dep(clos)
(16) /iN+fene/ [impfene], *imfene, *ifene, *if f ene, *impene
In (16), we see the emergence of an affricate from assimilation of a nasal to

a fricative. The rst candidate is ruled out by its failure to assimilate. In the
second, complete assimilation takes place, resulting in the loss of the nasal
component, a fatal violation. In the third candidate, nasality remains but over-
laps with a critical manner of articulation, producing a nasalized fricative, a
marked segment that is absent from the Zulu inventory. The fourth candidate
produces a licit sequence, [mp], but does so at the expense of the continuant,
which is ruled out by Max(constr). In the nal candidate, we nd that viola-
tion of the low-ranked Dep(clos) produces the optimal output.
3.2. Loss of laryngeal features

While markedness violations in the case of fricatives arise from the oral con-
striction gesture itself, problems with implosives and affricates come from
overlap of gestures in the glottal tier of C with N forced by Align and
*Long. For aspiration and implosion to be perceived, the target of the glottal
must be aligned with the offset of the oral constriction gesture (Ladefoged and
Maddieson 1996).
358 Claire Halpert
(17) Align(Constr, offset, Glo, target): Align the offset of the oral constric-
tion gesture with the target of the glottal gesture.
With this alignment constraint and *Long both high-ranked, the gestures
in the glottal tier of C are made to overlap N.9 In the case of implosion and
aspiration, overlap with the nasal would create a highly marked structure, and
one that is not attested in Zulu (Doke 1969, Ladefoged and Maddieson 1993,
Silverman 1997). Markedness constraints against aspirated and implosive
nasals prohibit overlap of these glottal gestures with N:
(18) a. * : no implosive nasals.

b. *N: no aspirated nasals.
Here these markedness constraints outrank a faithfulness constraint for the
gestures in the glottal tier (Max(glo)), which results in those gestures being
lost in assimilated NC sequences10.
(19) *Long, Align, * , *N, Max(vel) X Max(glo)
In the case of //, the glottis-lowering gesture that creates the implosive is
lost, resulting in a plain /b/ in the output, illustrated in the tableau in (20). In
the rst candidate, the oral constriction gesture lengthens to overlap the nasal
segment, resulting in a *Long violation. In the second candidate, the oral con-
striction gesture remains short and shifts to overlap the nasal; the glottal here
does not fully shift with the oral constriction gesture in order to avoid overlap
with the nasal, resulting in an Align violation, since it does not have time
to achieve its target by the offset of the oral closure. In the third candidate,
the oral constriction remains short and the glottal gesture maintains alignment,
resulting in overlap between the glottal lowering gesture and the nasal segment,
which violates the markedness constraint. In the fourth candidate, a markedness
violation is avoided by loss of the nasal, which violates faithfulness. The winner
violates a low-ranking faithfulness constraint by losing implosion.
9. Evidence from other Bantu languages that this overlap would, in fact, be the result
of these two constraints is discussed in section 5.
10. While these markedness constraints appear to be undominated in Zulu and may
seem superuous, the system is such that they could be outranked by *Long and
Align, resulting in the emergence of segments that otherwise dont surface in a
language. Evidence that we might want such a system comes from languages like
Pokomo, where voiceless nasals only surface as a result of assimilation (Huffman
and Hinnebusch 1998).
(20) /iN+ ali/ imbali, *imali, *i , *iali
In the case of aspiration, however, while the aspiration itself is lost, what
surfaces is not a plain stop, but rather an ejected stop (6). The loss of aspira-
tion can be accounted for in the same way as loss of implosion; a separate
explanation is needed for the appearance of ejection. Zulu lacks a plain voice-
less stop series, so perhaps the lack of plain voiceless stops in NC environ-
ments is unsurprising. Moreover, voicelessness in postnasal position can be
difcult to perceive (Pater 1996, Ladefoged and Maddieson 1996, Silverman
1997); in order to prevent it from simply being perceived as voiced, strategies
such as increased VOT are often employed (Hayes and Stivers 1995). As we
will see in section 5, [mp, nt, k] with plain voiceless stops is not a common
output for assimilating NC sequences with voiceless stops anywhere in Bantu
(Kadima 1969, Kerremans 1980).
360 Claire Halpert
I propose that NC , with a plain voiceless stop, is perceptually marked in

Zulu; in order for voicing contrasts to be preserved, some VOT lag is neces-
sary. While aspiration would create a highly marked overlapped structure with
the preceding nasal, and is thus avoided, creating an ejective allows the voic-
ing contrast to be preserved.11
4. Evidence for *Long
By including the high-ranked *Long constraint, which requires oral constric-

tion gestures to match the target duration of a single C, my account predicts
that the NC sequences that have undergone assimilation should exhibit duration
approximate to the duration of C12. In contrast, unassimilated mC sequences
are expected to be longer, since two oral constriction gestures are present. To
examine this prediction, I conducted an acoustic pilot study of mC and NC
sequences in Zulu.
4.1. Method
A single female native Zulu speaker, bilingual with Xhosa and uent in
English as L3, was recorded producing singleton C, unassimilated mC, and
assimilated NC sequences in intervocalic position in minimal, or near mini-
mal, pair words. The goal of the initial study was to examine the labial se-
quences, so the relevant tokens had stem initial /p/, /m/, and /f/ (though other
words involving mC and NC sequences appeared among the llers). The study
included 60 target words and 60 llers, taken from Doke et al (1990).13 All
tokens were trisyllabic, with sequences occurring between the rst and second
syllable nuclei. Tokens were recorded in the carrier phrase Angiboni X
encwadini (I didnt see X in the book). The speaker was instructed to speak
at a steady, normal speech rate.
11. Fallon (2002) suggests that ejectivity seems to be a general strategy in Zulu for
enhancing voicing contrasts in obstruents.
12. There is a body of literature addressing the question of duration for various types
of NC sequences cross-linguistically. A common prediction is that prenasalized
segments will have a duration matching single C, while true clusters will be
longer. These hypotheses could relate to my Zulu hypotheses, though the results
of such studies seem to be mixed (see Riehl 2008 for discussion).
13. A few tokens were constructed following phonotactics for possible words. The
speaker encountered these words initially in contexts where the underlying nature
of the stem-initial consonant was unambiguous.
(21) Example stimuli

a. upaa roof
b. umpaa species of tree
c. impaa item of goods
The stimuli were presented in random order; the entire set of stimuli was
presented twice (with different ordering), resulting in two tokens for each
word. The speaker seemed unfamiliar with some words presented and pro-
duced nonuent tokens in the rst presentation of the data; words in which
the speaker stumbled were removed from the data set, resulting in a total
of 114 tokens. Data was recorded and measured using Praat (Boersma and
Weenink 2009). Duration of the segment or sequence from closure to release
was measured. Stops were measured from the beginning of stop closure to
the burst; fricatives from the onset of frication noise to the appearance of the
second formant of the following vowel; nasals were measured starting from
the end of F2 of the preceding vowel and ending either with the start of
F2 of the following vowel or the end of the nasal formant before a following
consonant.
4.2. Results
One-way ANOVA was performed for each group of data (p-stems, m-stems,
f-stems, and mf-stems) to determine whether the mean durations differed signif-
icantly. Post hoc Tukey HSD, using = .01, was calculated to determine which
pairwise differences in each group of data were signicant. Results are sum-
marized below in Tables 35. In each group, differences among mean dura-
tions were signicant. However, pairwise Tukey comparisons revealed that
while the unassimilated mC sequences (column 1 in the tables) differed sig-
nicantly from NC and C (columns 2 and 3, respectively), NC and C did not
differ signicantly from each other. Boldface in the tables indicates groups that
did not differ signicantly from each other in pairwise Tukey HSD.
Table 3 summarizes data from p-initial stems. The rst column gives meas-
urements for unassimilated [mp] outputs resulting from um- prexation. This
unassimilated output is predicted to contain 2 oral closure gestures, one per
nasal and oral segment, and so duration should reect their presence. In
contrast, only a single oral closure gesture is predicted to be present in both
the assimilated [mp] and plan [p] cases. As predicted, while [mp] differed
signicantly from the other groups, they did not differ from each other.
362 Claire Halpert
Table 3. Duration of Np-type sequences
/um+p h / /iN+ph/ /ph/

[mp h] [mp] [ph ]
Duration 426ms 271ms 237ms
St Dev 38ms 38ms 30ms
n= 15 16 14
ANOVA: F = 118.15, p < .0001; Tukey ([mph]-[mp]): p < .01;

Tukey ([mph]-[p h ]): p < .01; Tukey ([mp]-[p h]): p > .01
Table 4. Duration of Nm-type sequences
/um+m/ /iN+m/ /m/

[m] [m] [m]
Duration 323ms 158ms 165ms
St Dev 22ms 16ms 35ms
n= 6 8 8
ANOVA: F = 91.87, p < .0001; Tukey (/um+m/-/iN+m/): p < .01;

Tukey (/um+m/-/m/): p < .01; Tukey (/iN+m/-/m/): p > .01
Table 4 gives the results for m-initial stems. Again, the unassimilated um+m-
initial stem structure is signicantly longer than the assimilated and underlying
singleton structures, though the latter two do not differ from each other.
Finally, Table 5 summarizes the data for both the f-initial stems and the
mf -initial stems. The two groups were analyzed separately, and again followed
the patterns seen with m-initial and p-initial stems: unassimilated sequences
were signicantly longer than either assimilated or singleton sequences, which
again did not differ signicantly from each other.
While more phonetic data is clearly needed here, these initial results indi-
cate that there are no signicant differences in duration between an assimilated
NC sequence and a singleton C. This outcome is in line with the prediction
that there is only a single oral closure gesture, whose duration is limited by
*Long, in these contexts. In contrast, evidence that unassimilated mC sequences
are signicantly longer is in line with the non-overlapping analysis of these
sequences.
Table 5. Duration of Nf-type sequences
/um+f/ /iN+f/ /f/ /um+mf/ /iN+mf / /mf/

[mf ] [mpf] [f ] [mf] [mf] [mf]
Duration 359ms 230ms 181ms 429ms 224ms 201ms
St Dev 30ms 32ms 36ms 26ms 23ms 29ms
n= 8 7 8 8 8 8
f-initial: ANOVA: F = 60.71, p < .0001; Tukey ([mf]-[mpf]): p < .01;

Tukey ([mf ]-[f ]): p < .01; Tukey ([mpf]-[f ]): p > .01
mf-initial: ANOVA: F = 191.67, p < .0001;
Tukey (/um+mf/-/iN+mf/): p < .01; Tukey (/um+mf/-/mf/): p < .01;
Tukey (/iN+mf/-/mf/): p > .01
5. Typology of NC Assimilation in Bantu
While the previous section examines acoustic evidence bearing on the *Long
constraint that is crucial to my analysis in Zulu, in this section I present a brief
examination of patterns of nasal place assimilation throughout Bantu that
provide language-external support for my analysis.
I have argued that in Zulu, *Long and Align conspire to cause overlap
between glottal gestures and N. Because of markedness constraints, though,
we never actually see direct evidence that such an overlap would have occurred
in Zulu; I simply infer the problematic overlap from the absence of the glottal
gestures in the output.
Several Bantu languages do show evidence of just such an overlap: Kinyar-
wanda, Pokomo, and Sukuma all have processes of nasal place assimilation in
which N followed by an underlying aspirated consonant results in aspiration
being realized on the nasal in the assimilated sequence (Kimenyi 1979, Sagey
1986, Maddieson 1991, Huffman and Hinnebusch 1998). Crucially, in these
languages the glottal spreading gesture maintains a constant timing with respect
to the oral constriction gesture in both C and NC; Huffman and Hinnebusch
(1998) directly credit this timing with the resulting overlap onto the nasal
portion of the sequence in Pokomo. It seems reasonable to say, then, that nasal
place assimilation is being driven by the same basic mechanisms in these
languages as it is in Zulu, with the only difference being the ranking of a
*N markedness constraint; its high ranking in Zulu prohibits the overlapped
sequences from surfacing as is, while in Kinyarwanda, Pokomo, and Sukuma
it is violated in favor of preserving the glottal gesture.
364 Claire Halpert
While a variety of outputs emerge in NC sequences that have undergone

place assimilation in Bantu, the variations crop up in the same environments
language to language, such as N+C sequences, N+fricative sequences, N+h
sequences (Kadima 1969, Kerremans 1980). In short, the variations emerge
in precisely the environments in which I argued problematic gestural overlaps
were resulting from nasal place assimilation in Zulu. Kadima (1969) lists some
of the most common outputs for N+C and N+fricative sequences accross
Bantu:
(22) /N+ p, t, k/14
a. /p, t, k/ b. /p, t, k/
c. /mb, nd, g/ d. /mp, nt, k/
e. /m, n, /
(23) /N+fricative/
a. /f/ b. /pf/ c. /mpf/
d. /mp/ e. /mb/
The analysis I have proposed for Zulu can easily accommodate the range of
Bantu NC strategies. While the basic mechanisms for place assimilation
induce the same problematic instances of gestural overlap, the range of outputs
merely reects different rankings of markedness and faithfulness.
6. Conclusion
In this paper I have argued that the effects observed on C of assimilating NC

sequences in Zulu can be understood as a result of gestural overlap that comes
about in assimilation. I proposed that beyond the gestural overlap of the oral
constriction gestures directly involved in the place assimilation, other gestures
of the trigger also necessarily overlap the target during the Zulu place assimi-
lation process. I employed Align and *Long constraints to formalize the
overlap of gestures and proposed that markedness constraints involving nasal
production and perception are violated as a result of this overlap. I argued that
the postnasal hardening cases are those in which markedness violations are
avoided by changing the closure portion of the oral constriction gesture to
create an affricate, while loss of laryngeal features occurs when avoidance
of markedness violations results in feature deletion. Preliminary acoustic data
indicates that there are durational differences between mC, NC, and C, in line
with the predctions of a *Long constraint.
14. Where /p, t, k/ are not distinguished for underlying laryngeal properties.
The overlap-driven account of nasal place assimilation in Zulu appears

promising in advancing our understanding of nasal place assimilation as a
cross-Bantu phenomenon. While some of the effects observed in Zulu are fairly
rare from a cross-Bantu perspective, my account of them provides insight into
why certain underlying environments would condition changes to gestures
other than the oral constriction gesture in the output. An initial survey of
Bantu languages indicates that these environments indeed condition such
changes throughout Bantu and that this analysis is on the right track to devel-
oping a unied account of nasal place assimilation in Bantu, and perhaps of
place assimilation more generally.
A. Zulu stimuli
/um+ph/ /iN+ph/ /ph/
[mph] [mp] [ph]
umphambo impamba iphamba
umphahla impahla iphahla
umphalo impalo ipalo
umphako impaka iphako
umphela impela iphela
umphiki impiko uphiko
umphobo impobo iphoba
umphundo impundu iphundu
/um+m/ /iN+m/ /m/
[m] [m] [m]
ummango imandlo umango
umminzo imini umina
ummeli imeli umema
ummese imeshe umese
/um+f/ /iN+f/ /f/
[mf ] [mpf] [f ]
umki imko uki
umfula imfule ifule
umfundi imfundo ufundo
umfusi imfusi ifusi
/um+mf/ /iN+mf / /mf/
[mf] [mf] [mf]
ummfamu imfamu umfamu
ummfumfu imfumfu umfumfu
366 Claire Halpert
References

2009 Praat: doing phonetics by computer [Computer program]. Version
5.1.19, retrieved 21 October 2009 from http://www.praat.org/
Browman, Catherine and Louis Goldstein
252.
1989 Articulatory gestures as phonological units. Phonology 6: 201251.
1990 Tiers in articulatory phonology, with some implications for casual
speech In: John Kingston and Mary Beckman (eds.), Papers in
laboratory phonology I: Between the grammar and physics of
speech. Cambridge: Cambridge University Press: 341376.
1992a Articulatory phonology: An overview. Phonetica 49: 155180.
1992b Response to Commentaries. Phonetica 49: 222234.
2001 Competing Constraints on Intergestural Coordination and Self-
Organization of Phonological Structures. Les Cahiers de lICP,
Bulletin de la Communication Parle 5.
Chen, Larissa
2003 The origins in overlap of place assimilation. In: Gina Garding and
Mimu Tsujimura (eds.), WCCFL 22 Proceedings of the XXIIth West
Coast Conferences on Formal Linguistics: 137150. Somerville,
MA: Cascadilla Press.
Chitoran, Ioana, Louis Goldstein and Dani Byrd
2002 Gestural overlap and recoverability: Articulatory evidence from
Georgian. In: Carlos Gussenhoven and Natasha Warner (eds.),
Laboratory Phonology 7. New York: Mouton de Gruyter.
Cohn, Abigail
1993 The Status of Nasalized Consonants. In: Marie Huffman and Rena
Krakow (eds.), Phonetics and Phonology, Volume 5: Nasals, Nasal-
ization, and the Velum. San Diego: Academic Press.
Doke, CM
[1923] 1969 The Phonetics of the Zulu language. Nendeln: Kraus Reprint.
Doke, CM, DM Malcolm, JMA Sikakana, and BW Vilakazi
1990 English-Zulu Zulu-English Dictionary. Johannesburg: Witwatersrand
University Press.
Downing, Laura
2005 On the ambiguous segmental status of nasals in homorganic NC
sequences. In: Marc van Oostendorp and Jeroen van de Weijer
(eds.), The Internal Organization of Phonological Segments. Berlin:
Mouton de Gruyter, 183216.
Gafos, Adamantios
2002 A Grammar of Gestural Coordination. Natural Language & Linguis-
tic Theory 20: 269337. The Netherlands: Kluwer Academic Pub-
lishers.
Hayes, Bruce and Tanya Stivers
1995 The Phonetics of post-nasal voicing. Ms., UCLA.
Herbert, Robert
1986 Language Universals, Markedness Theory, and Natural Phonetic
Processes. Berlin: Mouton de Gruyter.
Huffman, Marie and Thomas Hinnebusch
1998 The phonetic nature of voiceless nasals in Pokomo: Implications
for sound change. Journal of African Languages and Linguistics
19: 119.
Jun, Jongho
1995 Perceptual and articulatory factors in place assimilation: An optimality
theoretic approach. Los Angeles, CA: UCLA dissertation.
Jun, Jongho
1996 Place assimilation is not the result of gestural overlap: Evidence
from Korean and English. Phonology 13: 377407.
Jun, Jongho
2004 Place assimilation. In: Bruce Hayes, Robert Kirchner and Donca
Steriade, (eds.), Phonetically Based Phonology. Cambridge Univer-
sity Press.
Kadima, Marcel
1969 Le Systeme des Classes en Bantou. Leuven: Vander.
Kerremans, R.
1980 Nasale suivie de consonne sourde en Proto-Bantou. Africana lin-
guistica 8: 159198.
Kimenyi, Alexandre
1979 Studies in Kinyarwanda and Bantu Phonology. Edmonton: Linguistic
Research, Inc.
Ladefoged, Peter and Ian Maddieson
1996 The Sounds of the Worlds Languages. Cambridge, MA: Blackwell.
Maddieson, Ian
1991 Articulatory phonology and Sukuma aspirated nasals. In: Proceed-
ings of Berkeley Linguistic Society, Special African Session: 145
153.
Maddieson, Ian and Peter Ladefoged
1993 Phonetics of Partially Nasal Consonants. In: Marie Huffman and
Rena Krakow (eds.), Phonetics and Phonology, Volume 5: Nasals,
Nasalization, and the Velum. San Diego: Academic Press.
Meinhof, Carl
1932 Introduction to the Phonology of the Bantu Languages. Berlin:
Deitrich Reimer.
368 Claire Halpert
Nam, Hosung
2007a Gestural coupling model of syllable structure. New Haven, CT: Yale
dissertation.
Nam, Hosung
2007b Syllable-level intergestural timing model: Split-gesture dynamics
focusing on positional asymmetry and moraic structure. In: Jennifer
Cole and Jose Ignacio Hualde (eds.), Papers in Laboratory Phonology
IX. Berlin: Mouton de Gruyter.
Padgett, Jaye
1994 Stricture and Nasal place Assimilation. Natural Language & Lin-
guistic Theory 12: 465513. The Netherlands: Kluwer Academic
Publishers.
Padgett, Jaye
1995 Partial Class Behavior and Nasal Place Assimilation. Proceedings
of the Arizona Phonology Conference: Workshop on Features in
Optimality Theory. Tuscon: Coyote Working Papers, University of
Arizona.
Padgett, Jaye
2001 The Unabridged Feature Classes in Phonology. Ms., University of
California, Santa Cruz.
Pater, Joe
1996 *NC. In: Kiyomi Kusumoto (ed.), Proceedings of NELS 26. Amherst,
MA: GLSA.
Port, R., and J. Dalby
1982 Consonant/Vowel Ratio as a Cue for Voicing in English. Perception
and Psychophysics 32: 141512.
Riehl, Anastasia
2008 The phonology and phonetics of nasal obstruent sequences. Ithaca,
NY: Cornell dissertation.
Sagey, Elizabeth
1986 The representation of features and relations in non-linear phonology.
Cambridge, Mass.: MIT dissertation.
Saltzman, Elliot and Kevin Munhall
1989 A Dynamical Approach to Gestural Patterning in Speech Produc-
tion. Ecological Psychology 1 (4): 333382.
Son, Minjung, Alexei Kochetov, and Marianne Pouplier
2007 The role of gestural overlap in perceptual place assimilation: Evi-
dence from Korean. In: Jennifer Cole and Jose Ignacio Hualde
(eds.), Papers in Laboratory Phonology IX. Berlin: Mouton de
Gruyter.
Silverman, Daniel
1997 Phasing and Recoverability. New York: Garland.
Steriade, Donca
1993 Closure, release and Nasal Contours. In: Marie Huffman and Rena
Krakow (eds.), Phonetics and Phonology, Volume 5: Nasals, Nasal-
ization, and the Velum. San Diego: Academic Press.
The acoustics of high-vowel loss in a Northern Greek
dialect and typological implications*
Nina Topintzi and Mary Baltazani
Abstract
We offer an analysis of Vowel Deletion in the Kozani Greek (NW Greece) dialect,
investigating the environment, the acoustic correlates, the various realisation stages
and the vowel quality differences in its application. Our data suggest that Vowel Dele-
tion is gradient and variable, correlating with increased aspiration and duration of
the consonants adjacent to the deleted vowel to an extent, but not reliably so for all
segments. Furthermore, there is an asymmetry between high vowels in the application
of Vowel Deletion, with [i] more resistant to Vowel Deletion than [u]. Our concurrent
exploration of the consonantal clusters created as a result of Vowel Deletion in Kozani
Greek unveils a wider inventory of consonantal clusters as well as a richer range of
codas emerging in this dialect compared to Standard Greek. Beyond the descriptive
goals of the paper, we also discuss the theoretical implications of the Kozani Greek
data for the typology of Vowel Deletion. The application of Vowel Deletion between
voiced consonants in Kozani Greek is an extremely rare phenomenon which has so
far been left unaccounted for by gestural overlap theories of Vowel Deletion. We tenta-
tively argue that gestural overlap can extend to this case and hypothesise its specic
effects.
1. Introduction
Northern Greek dialects (roughly covering the areas of central Greece, Thessaly,
Macedonia, Epirus, Thrace, Euboea, and some islands in the Ionian and NE
Aegean) have a characteristic process of high-vowel (i, u) deletion (VD) in
unstressed syllables leading to the creation of various consonant clusters, as
shown in (1).
(1) Northern Greek Standard Greek
plka plika I washed
pl pul bird
fs fis blow
vn vun mountain
* We gratefully acknowledge Lasse Bombien and two anonymous reviewers for

instructive comments and suggestions. The usual disclaimers apply.
370 Nina Topintzi and Mary Baltazani
The term VD (vowel deletion) here is not used in the narrow sense of
vowel elision; rather, it refers to the phenomenon which phonetically gets to
be realised along a continuum of processes (see below for details), chief
among which are vowel devoicing and elision itself. Whenever a distinction
needs to be made among the processes discussed, we will spell it out explicitly.
Moreover, Greek VD is unrelated to the process of metrically-driven vowel
deletion occurring in other languages as a means to satisfy some metrical
requirement (cf. Gouskova 2003).1 For instance, in odd-parity words of Yidi,
the nal vowel is deleted so that all material is parsed in unmarked binary
feet leaving no syllable unparsed, e.g. /gindanu/ (gin.dn:) *(gin.d:)nu
moon-abs vs. /gindanu-gu/ (gnda)(ngu) moon-erg. In contrast, VD
in Northern Greek may actually produce marked, metrically-speaking, struc-
tures as in e.g. /spiti/ (spt) house with a marked unary foot instead of
the Standard Greek (sp.ti) which presents an unmarked binary one.
VD, despite being pervasive in Greek, is yet poorly understood. Our paper
aspires to shed light on Greek VD from an acoustic point of view, examine its
effects with respect to consonant cluster formation and compare its manifesta-
tion to other instances of the phenomenon typologically. The rst goal is driven
by the paucity of research on Northern Greek VD. While it is a phenomenon
widely cited impressionistically within Greek linguistics, (Chatzidakis 1905;
Papadopoulos 1927; Newton 1972; Browning 1991; Kondosopoulos 2000;
Trudgill 2003), it has been barely investigated phonetically for this cluster of
dialects. More recently there has been a number of experimental studies inves-
tigating VD in Cypriot Greek (Eftychiou 2008) and in Standard Modern
Greek (Dauer 1980; Arvaniti 1994, 1999; Nicolaidis 2001, 2003; Baltazani
2007a, b; Loukina 2008), the majority of which suggest that it is common for
high vowels. Our choice to study Kozani Greek (NW Greece) is justied by
the fact that in this dialect VD habitually occurs, whereas in most of the other
ones, it is less regular. We thus hope to offer a more comprehensive explora-
tion of this phenomenon in Greek.
Our study leads to a number of ndings. In particular, we show that VD
correlates with increased aspiration and duration of the consonants adjacent
to the deleted vowel to an extent, but not reliably for all segments. In addition,
we conrm the gradience and variability of VD also reported in cross-linguistic
research. Furthermore, we observe a rather dramatic asymmetry between
high vowels in the application of VD, so that [i] appears more resistant to
VD than [u].
1. We thank an anonymous reviewer for asking this question.

The acoustics of high-vowel loss in a Northern Greek dialect 371
A natural consequence of VD is of course consonant cluster formation,

which brings us to the second aim of this paper. In particular, we examine the
range of clusters created and see how they are differentiated from those of
Standard Modern Greek (SMG ). We conclude that several clusters emerge
that are banned in SMG and that the dialect encompasses a much wider inven-
tory of coda consonants. Furthermore, our results suggest that consonant
clusters created as a result of VD are much less stable in duration than under-
lying clusters.
Finally, we examine how Kozani Greek (KG ) ts within the VD typology.
We show that it illustrates several ndings, some of which are quite ordinary,
while others are highly unusual cross-linguistically. In particular, it allows VD
between voiced consonants, a pattern that is considered exceedingly rare
(Dauer 1980).
It needs to be pointed out however that our data are based on one bi-dialectal
speaker of KG and SMG, as explained in Section 2. Thus, the effects that stem
from the acoustic analysis of these data, although robust for this speaker, should
be checked for generalizability in future studies with additional KG speakers.
On the other hand, our experience from listening to other KG speakers but
without having recordings to analyse leads us to anticipate that VD-variability
and increased coda inventory are uniformly found. Again, further research
should be able to conrm this impression.
The paper is structured as follows. Section 2 discusses the data this
research is based on. Section 3 reports the results of the study. After presenting
general observations, the focus then shifts on the specics of VD in KG.
Section 4 deals with consonant clusters as a product of VD, while Section 5
places KG in a typological perspective. Finally, Section 6 offers a few con-
cluding remarks.
2. Data collected
Our data come from recordings of a male speaker of KG in his 60s. The
recording was conducted by the rst author in December 2007 in Kozani.
Kozani is a city of about 50,000 inhabitants in northern Greece, located in
the western part of Macedonia, 120 Km south-west of Thessaloniki. The
speaker, Lazaros Kouziakis, read aloud one of the stories he has collected in
Kouziakis (2008), a volume with a collection of stories describing aspects of
life in Kozani during the past decades. The piece we analysed relates to a story
of a trumpeter. It contains 1264 words and 5555 segments and is approxi-
mately 18mins long.
Interestingly, KG texts attempt to represent the dialects VD process, by

deleting the vowel in question in the orthography and using an apostrophe in
its place, e.g. ,
[Every night, after work, the Skarkiotes would go to the grocery
stores] (Kouziakis 2008: 193). In the Standard orthography, this would be:
,
with the missing vowels (in boldface) in place.
Finally, note two facts here: (a) our data come from read, non-spontaneous
speech, therefore the pronunciation is more deliberate and careful and (b) the
speaker is bi-dialectal between KG and SMG, therefore there is a possibility
that his dialectal accent is not as strong as that of a mono-dialectal speaker.
We expect to nd more deletions and therefore more clusters in a speaker pro-
ducing spontaneous speech and not inuenced by the standard.
The speech was analysed acoustically using PRAAT (Boersma and Weenink
2009). Our analysis consisted of manually segmenting the speech into words
and phones and, based on waveforms and spectrograms, annotating for vowel
deletion in its various realisations, cluster creation and aspiration. We mea-
sured all non-deleted unstressed high vowels and the consonants anking
them, as well as the consonants forming clusters as a result of VD. For this
paper we measured only segment durations systematically; formant measure-
ments have not been completed yet, therefore we do not report on them at
this stage.
3. Results
This section amasses the results of our study categorising them in three dis-
tinct subsections. 3.1 reports on general observations regarding VD that bring
it on a par with other languages that exhibit VD. 3.2 presents the consonantal
effects resulting from VD and 3.3 focuses on more specic aspects of Kozani
VD itself.
3.1. General observations

VD is attested in a fair number of languages; in the cases that it emerges as
devoicing however, devoiced vowels usually have an allophonic rather than
contrastive relationship towards fully voiced vowels, as Gordons survey shows
(Gordon 1998). Typically, VD affects high vowels, found in unstressed positions
with a preference for the nal one. The VD-undergoing vowels are normally
preceded and in the case of a word-medial position also followed by voice-
less consonants.
Previous research on VD has concluded that it is a gradient rather than a

categorical phenomenon (e.g. Kondo 1994 for Japanese; Jannedy 1995 for
Turkish; Jun and Beckman 1993, 1994 for Korean; Chitoran and Babaliyeva
2007 on Lezgian and Delforge 2008 on Cusco Spanish) with various phonetic
realisations. KG too conrms this nding, displaying a range of VD outputs,
including fully voiced vowels, completely or partly devoiced, as well as fully
elided ones.
Variability is also found among tokens of the same word, which again is
not unprecedented, c.f. Gordon (1998); Shiraishi (2003) for Ainu; Delforge
(2008) for Cusco Spanish. For instance, in [tsitsna] a female name often
mentioned in the story read by our speaker the pretonic /i/ undergoes VD
and surfaces with various manifestations ranging from full vowel to complete
deletion as shown in Figure 1.
Panel (a) shows the realisation of [i] as a full vowel with regular voicing
and formant structure. In panel (b) there is still a separate segment between
the two [ts] affricates but it is mostly a voiced fricative, thus marked as [].
In panel (c) there is no transition evident from the rst affricate into a different
segment. Instead, the affricate itself lasts longer, almost twice as much as in
the previous two tokens. Lengthening is only one of the effects of VD on its
adjacent consonants. The relevant effects are thoroughly examined in the next
section.
3.2. Consonantal effects associated with VD

A number of effects accompany KG VD to a certain degree. However, as it
will be shown, none is systematic and consistent enough to be regarded as the
principal effect of VD for all segments. As already mentioned (Section 2),
among our measurements was the duration of consonants anking the unstressed
high vowels [i] and [u] whether they underwent VD or not. In the discussion
below regarding duration of consonants we do not distinguish between C1 and
C2, i.e. the consonant before and after the VD undergoing vowel (for addi-
tional discussion on the distribution of the anking consonants, see section 4).
With regard to stops, the rst noticeable effect is aspiration (see also Mo
2007 and Jun and Beckman 1993 on Korean).2 In Kozani Greek, voiceless
stops in general have a small amount of aspiration (up to 30ms), just as has
been reported for Standard Modern Greek (e.g. Dauer 1980; Fourakis 1986;
Arvaniti 2001). However, in KG [t] and [c] show increased aspiration duration
2. In Jun and Beckman (1993, 1994) the causation chain is the reverse: aspirated
consonants cause devoicing and not the other way round.
Figure 1. Token variability of pretonic [i] in [tsitsna] (female name). Upper panel
shows a full vowel; middle panel has a voiced fricative instead of [i] &
lower panel shows total deletion of the segment.
after VD compared to their counterparts in environments where no VD has

occurred, (Fig. 2, left panel) a nding which is particularly interesting, since
Greek is claimed to lack aspirated Cs (Dauer 1980; Fourakis 1986; Arvaniti
2001; among many others). On the other hand, the aspiration of [p] and [k]
shows no difference in the two conditions.3
A similar result appears in relation to the stop closure duration. This time,
it is [t] and [k] which show slightly increased closure duration after deletion
but the closure of [p] and [c] shows the opposite trend (Fig. 2, right panel).
Here and henceforth, ND indicates absence of deletion, whereas D marks its
presence.
Figure 2. Aspiration in stops is not consistently longer after VD (left panel). Stop
closure duration is not consistently longer after VD (right panel).
We also examined the sum of duration + aspiration changes in the two con-
ditions to determine whether there was an additive effect of VD, but as is
shown in Figure 3, vowel deletion only seems to have an effect on the duration
of [t] and no effect on the duration of [p, c, k].
3. An anonymous reviewer correctly points out that what we have called aspiration
may be frication at the release of a coronal stop into the narrow constriction of a
high vowel, explaining the difference between [t] and [c] on one hand and [p] and
[k] on the other. This distinction merits further exploration, however, the fact
remains that regardless of the exact phonetic nature of this interval, in VD envi-
ronments the period between the burst of a stop and the onset of the next segment
is longer than in non-VD environments.
Figure 3. Only [t] is consistently longer after VD.
Figure 4 gives a representative example of increased aspiration after VD for

[t]. The token shown in this gure is the word [spiti] home, pronounced
[spit], that is, with the nal vowel deleted. As we will see (section 3.3) vowel
deletion most frequently occurs in word-nal position and it is in that position
that its effects on adjacent consonants are most robust. In the example shown
here, the aspiration period is dramatically longer (91ms) than the average
value not only because of the word nal position, but also, we hypothesise,
because of the segmental environment: the word is part of the phrase [sto spiti
tis tsitsonas] at tsitsonas home which surfaces as [sto spit ts tsitsonas], with
two adjacent instances of VD and the creation of the long consonantal
sequence [t ts ts].
Figure 4. Spectrogram of [sto spit ts] at her house.

Duration increase as a result of VD is also observed for fricatives and

sonorants as well. Figure 5 presents the results in the environment of VD and
lack thereof both for fricatives (left panel) and for sonorants (right panel). For
the majority of the segments examined, the effect of VD was duration length-
ening of the consonants adjacent to the deleted vowel.4
Figure 5. Most fricatives (left panel) & sonorants (right panel) are longer after VD.
Eleven out of the sixteen consonant categories displayed increased duration

after vowel deletion: 6 out of 9 fricatives and 5 out of 7 sonorants. On the
other hand, [, , x] and [m, ] in fact appear shorter after VD. At this point,
we can only speculate about this discrepancy. Anticipating next sections dis-
cussion, we nd that half the time, VD occurs word-nally, thus creating a
host of singleton codas (or consonantal clusters). With the exception of the
three fricatives just mentioned, all other fricatives emerge as word-nal codas
and commonly appear much longer than in other positions, including word-
medial codas. Thus, there is perhaps a nal lengthening effect or as Dauer
(1980: 25) calls it: a stretching-out of nal syllables that raises the average
duration of a specic set of fricatives. It is of course an open question whether
[, , x] are illegitimate nal codas in KG. Although they have not appeared in
the string of speech we analysed, we cannot rule out the possibility that they
will appear in a longer stretch of speech.
As for the sonorants, all of them appear as word-nal codas. However, [m]
only shows up idiosyncratically in the place of the denite article [tin] when
4. In the graphs below, the following symbols have been used for convenience: sh = ,
xj = , nj = , lj = .
followed by a labial-initial word, e.g. /tin porta/ [m borta] the door, but
without any lengthening effect. Presumably, this is because it belongs to a
larger prosodic word, and in that position it is not nal. As for [], the transi-
tion between this palatal segment and a following vowel is characterised by a
[ j]-like onglide which makes the CV boundary very elusive and therefore we
suspect that our measurements in the No deletion condition underestimated
the duration of the consonant, something that did not happen in the Deletion
condition since in that case the neighbouring sound was a consonant making
segmentation much easier.
Duration increase thus supercially seems a relatively good indicator of
VD for fricatives and sonorants, but it is not infallible. To decide how reliable
the above results were, we also calculated the standard deviation (stdev) for
the duration measurements of all the sounds above. It turns out that this
number is larger in deletion cases than non deletion ones, which suggests that
there is greater variability to the duration of consonants after VD than when no
deletion takes place.
Figure 6. Standard deviation from the average value of duration for stop aspiration
(right panel) is greater than for stop closure (left panel). Higher values of
this number show greater variability in duration.
The least variability after VD appears in the stop closure duration, whereas it
is greater for stop aspiration, especially for [t], which recall, was most affected
by VD (Fig. 6). Variability proves even greater for fricatives and sonorants
(Fig. 7).
Figure 7. Standard deviation from the average value of duration for fricatives (left
panel) and sonorants (right panel). There are higher values of stdev for the
VD condition suggesting greater duration variability after VD.
There are various ways we can interpret these results. The most extreme
one is to suggest that none of the properties shown above is systematically an
effect of VD, since too much variability appears. Another, more conservative,
and perhaps more insightful explanation is that the duration of the underlying
consonants is more stable than that of derived ones, as mirrored by the
reduced variability of consonantal duration without VD. Speakers may thus
be attuned to associate VD with greater duration uctuation.
3.3. Vowel deletion in KG

One of the prominent factors that regulate VD distribution relates to the voic-
ing of the anking consonants. Given that in the environment of . . . C1VC2 . . . ,
either C1 or C2 can be [+/ voice], four logical voicing combinations are avail-
able, all of which emerge in KG. These are outlined below with representative
examples.
(2) VD and voicing of surrounding consonants

a. Voiceless voiceless: sikice > skce s/he stood up
b. Voiceless voiced: pulse > ple s/he was selling
c. Voiced voiceless: cnise > cnse s/he moved
d. Voiced voiced: traun > tran they sing
Of course, not all patterns appear with equal frequency (Table 1).
Table 1. VD frequency in different voicing environments. The last three columns

show, from left to right, frequency in word medial positions (% medial),
in word nal position (% nal), and in all positions considered together
(Total %).
Pattern i u Total # % of medial % of nal Total %
a. voi VD voi 31 8 39 40.21 20.31
b. voi VD +voi 10 2 12 12.38 6.25
c. +voi VD voi 34 1 35 36.08 18.23
d. +voi VD +voi 6 5 11 11.34 5.73
e. voi VD# 44 11 55 57.9 28.65
f. +voi VD# 35 5 40 42.1 20.83
TOTAL 160 32 192 100
Rows ad correspond to the four voicing combinations when they occur in

word medial positions. The rst two columns show the number of [i] and [u]
VD-undergoing tokens in each voicing environment. The next two columns
give totals in raw numbers and percentages in word medial positions respec-
tively. Word-medial VD most often occurs when C2 = voi (rows a + c = 76%),
and much less frequently when C2 = +voi (rows b + d = 24%). C1s voicing
quality, on the other hand, practically seems irrelevant. The asymmetry in the
distribution between patterns b and c will be further explored in section 5, as
will case d, the most interesting one due to its typological rarity. Pattern d
comes up to 12% of the word-medial data, that is, as frequently as deletion
between a voiceless and a voiced C (row b).
The penultimate column presents the frequency of VD word-nally, con-
sidered separately from other environments. In this position VD is highly
common and seemingly fairly independent from the voicing of the preceding
consonants, as the comparison between rows e and f highlights.
In the last column, putting these results together, we gain further insights
into the phenomenon. In particular, word-nal VD regardless of voicing (e+f )
accounts for the 50% of total VD5. Moreover, if we group the environments
according to voicing (a+f ), then 49% of deleted high vowels occur in the
environment [voi voi & voi #].
5. Prosodic position has also been argued to affect VD. For example, positions where
prosodic lengthening occurs are less likely to induce devoicing or deletion (Jun
and Beckman 1994 and references therein). Table 1 reveals that almost half
Up to this point, i-deletion and u-deletion have been considered together.

However, their behaviour is not identical when it comes to their susceptibility
to undergo VD in the rst place. Recall that VD occurs inconsistently. This
means that we often nd instances where segments could theoretically delete,
but fail to do so.
More specically, out of the unstressed [i] segments which could possibly
undergo deletion, 53% do not delete and 47% do. Respectively, the numbers
for [u] indicate a substantially different picture for it, since only 25% of the
possible [u] VD-undergoers fail to delete6. Baltazani (2006) in a study of
cross-word hiatus resolution in SMG, also reports that [i] deletion is rare,
while [u] deletes 75% of the time.
Figure 8. 53% of unstressed [i] do not delete (left pie chart). 25% of unstressed [u] do
not delete (right pie chart).
A second asymmetry between the high vowels comes forth in terms of

the environment VD applies in (compare Tables 2 and 3). While the over-
whelming majority of i-VD occurs post-tonically (almost 91%) with a mere
9% applying pre-tonically, [u]-VD shows a strong tendency to be pre-tonic.
the VD tokens occur word-nally, something that could be difcult to explain if

Greek exhibits word-nal prosodic lengthening. However, as shown in Baltazani
(2007b), SMG exhibits phrase-nal but no word-nal lengthening. Furthermore,
among the tokens in our corpus with word nal VD, only 18 out of the total 95
occur phrase nally the remaining 77 occur in a phrase medial position. Even
in those 18 phrasenal VD tokens, we observed considerable lengthening of the
nal consonant and, in many occasions, of the vowel in the preceding syllable.
6. More specically, 29 tokens delete and 10 do not. Of the deleted ones, 21 are in
monosyllables, mostly articles (tu, tus) or pronouns (mu). The remaining 8 occur
in larger words and are the ones considered in Table 4.
Table 2 gives more information on [i] VD (Column B). First, no [i] can
delete, unless it is immediately adjacent to the stressed syllable (cf. rows 1,
4&5 vs. rows 2&3). For [i]s that fail to delete even though they could
(Column A), it doesnt really matter whether the segment is before or after
stress: 49% of the [i]-tokens are post-tonic (rows 4&5) and 51% are pre-tonic
(rows 1, 2&3). Of the latter, most occur exactly one syllable before the
stressed one, whereas 9% appears 2, 3 or 4 syllables away from it. This can
be seen as a strengthening phenomenon of the pre-tonic position, something
that has been observed in other languages such as English (Turk and White
1999) and Spanish, Romanian and Portuguese (Chitoran and Hualde 2007).7
Table 2. The majority of [i] VD occurs post-tonically8

A. no deletion ND # ND % B. deletion D# D%
Pre-tonic by 1 36 41.86 Pre-tonic by 1 12 9.16
Pre-tonic by 2 6 6.98 Pre-tonic by 2 0 0
Pre-tonic by 34 2 2.33 Pre-tonic by 34 0 0
Post-tonic medially 26 30.23 Post-tonic medially 50 38.17
Post-tonic nally 16 18.6 Post-tonic nally 69 52.67
Sum 86 Sum 131
As for [u] (Table 3), roughly equal proportions fail to delete (although they
could potentially) in either pre- or post-tonic position (2nd column). This is on
a par with the [i]-ND results.
Table 3. Tendency for [u] VD to occur pre-tonically

ND # ND % D# D%
Pre-tonic 6 60% 7 87.5%
Post-tonic 4 40% 1 12.5%
A third asymmetry concerns where in the word VD occurs more often for
each of the vowels [i] and [u]. Setting aside the voicing specications of the
surrounding consonants (that will be discussed in Section 5), a comparison
between the two panels in Figure 9 reveals that overall i-deletion (left panel)
7. Thanks to a reviewer for pointing this connection out.

8. The number of tokens counted here is smaller than in Table 1 above because 29
monosyllables were excluded.
occurs in all positions within the word (initial, medial, nal), whereas u-
deletion (right) appears almost exclusively word-initially.
Figure 9. Position within the word where i-deletion (left panel) and u-deletion (right)
occur. i-deletion occurs in all positions within the word (initial, medial,
nal), u-deletion (right) is largely conned to the word-initial one.
(I = initial, F = nal, H = medial).
One nal asymmetry between [i] and [u] crops up. Before we present it
though, we need to describe another characteristic process of Northern Greek
dialects, unstressed mid vowel-raising, whereby we get /pei/ [pi] child,
/lio/ [lu] a little. In some dialects, raising and VD interact so that
raising feeds VD, e.g. /pei/ [pi] [p] in Mesolongi (Chatzidakis
1905: 261), but in most including KG for the most part 9 such chain shift
is inapplicable. Consequently, surface high vowels may either originate from
underlying high vowels or from underlying mid vowels /e/ and /o/ that raise
to [i] and [u] respectively, when unstressed, due to vowel raising.10
9. We say for the most part, because on occasion we have also seen VD of /e/ or /o/
in our data, e.g. /istera/ [stra]. It is possible to argue that such forms are under
the inuence of neighbouring dialects, e.g. the Velvendos dialect (Velvendos is a
town 33km NE of Kozani), where raising feeds VD. In that view, we must assume
an intermediate stage of vowel raising, i.e. [stira], that subsequently underwent
VD.
The fourth asymmetry then, relates to the source of surface high vowels:
while only 30% of unstressed surface [i]s hails from underlying /e/, the
number for unstressed surface [u]s differs signicantly. Here, only 8% stems
from underlying /u/ and the source of the remaining 92% is from input /o/. We
also anticipate that KG underlying high vowels should delete when unstressed,
but derived ones, should not. However, our prediction is not entirely borne
out: 70% of unstressed surface [i]s started high in the input too and should
have deleted but did not, compared to only 8% of unstressed surface [u]s
failing to delete although they stemmed from underlying /u/.
To recap, we have identied four main asymmetries between [i] and [u]
VD, summarised below:
[u] deletes more than [i] (75% vs. 47%)
[u]-deletion tends to be pretonic; [i]-deletion is overwhelmingly post-tonic
[u]-deletion systematically occurs word-initially; [i]-deletion occurs in all
positions in the word
most remaining unstressed surface [u]s are derived; most remaining
unstressed surface [i]s are underlying
All in all, our data thus reveal that [i] is more resistant to VD, whereas [u]
tends to delete more. Similar results, albeit debatable (see Tsuchida 2001:
227), have been reported for Japanese (Han 1962; Maekawa 1983). The exact
opposite situation emerges in Turkish (Jannedy 1995: 80), where [u] is slightly
more resistant to VD than the other high vowels of Turkish [i y ]. Differences
in the application of high vowel deletion based on the vowels quality thus
seem to arise on a language specic basis (see also Gordon 1998: 103, fn. 15).
But, what is the cause for this asymmetry?11 An obvious answer could be
vowel duration. Recall that high Vs are usually subject to VD due to their
short duration. It is thus conceivable that [u] is more prone to VD than [i],
because it is shorter. SMG vowel measurements are not clear on this point;
10. An anonymous reviewer raised the subject of whether there are contexts in which
this stem and others like it surface with a mid vowel. Although this stem does not
surface with a mid vowel and therefore its derivation from [e] is opaque in the
dialect, there are other stems where [] and [i] alternate in a paradigm making the
reason for non-deletion of the unstressed [i] transparent, e.g., [cif] head ~
[punucfalus] headache, [kasirc] cheese-diminutive [kasr] cheese, etc.
11. A reviewer makes a very interesting suggestion regarding potential differences in
the morphosyntactic load of /i/ and /u/ (cf. Gafos and Ralli 2001). Greek is highly
inectional and /i/ seems to be carrying more morphosyntactic features than /u/. If
that is the case, then its deletion would endanger its recoverability more than the
deletion of /u/. This hypothesis denitely merits exploration to be carried out in
future work.
Nicolaidis (2003) nds that unstressed [u] is shorter than unstressed [i],
whereas Fourakis, Botinis and Katsaiti (1999) nd the reverse. In both cases
the length difference is only about 79 ms, which is presumably hardly notice-
able. Our own measurement for KG vowels shows that, on average, [u] is
longer by 10ms than [i], contra our expectations. Again, the difference is not
only small, but also more importantly, the standard deviation value is very
large and if it is taken into account, then we cannot truly nd a difference in
duration between the two vowels.
The shorter-u duration hypothesis however cannot yet be eliminated. This
is because the unstressed u-tokens in our data have been very few; hence, our
duration measurement may not be entirely reliable. This apparent weakness
is by no means intrinsic to our study. Instead, it relates to general vowel
frequency effects. In a study of the occurrence frequency of all segments of
Standard Greek undertaken by the Institute for Language and Speech Process-
ing (ILSP), based on a corpus containing 148,333,836 SMG phone tokens
(Protopapas et al. 2010), unstressed [i] is the vowel occurring most frequently
in SMG (22% among vowels), while unstressed [u] is the least frequent (4%
among vowels), giving a 5:1 [i]:[u] ratio. Although no similar study has been
conducted on the frequency count of dialectal vowels, our hypothesis is that
there will not be great differences in the [i] [u] ratios in KG either.
4. Discussion: Consonant Clusters
As mentioned previously, VD takes various manifestations ranging from full

vowel maintenance to complete deletion, also allowing for intermediate stages
of devoicing along the way. While it is at present unclear how to phonologi-
cally represent VD (see also Gordon 1998 for discussion) in KG, it seems
reasonable to assume that at least in the instances of total vowel loss, various
consonant clusters are created (henceforth: derived clusters).12
Regardless of their position in the word, many of these derived clusters are
licit in SMG, but others appear as an innovation of this dialect.
12. No consonant clusters are created, of course, in cases of non complete deletion.
Even in tokens without any spectral evidence of vowel presence, however, we
cannot safely assume that speakers perceive acoustically adjacent consonants as
consonant clusters. There is a possibility that speakers still have a vowel in their
phonological representation and what we treat in the following discussion as
consonant clusters are not really such in the speakers mind.
(3) Derived clusters in KG & their correspondents if extant in SMG13

Legal in both KG&SMG Legal in KG only
2C 3C 4C 2C 3C 4C
pl, ft, sk, dv, bs, , k,
Word-initially
ts, sx, pt gnj, zn, mk
lk, rz, ns, r,
str, rst, k, k, zt, n, tk, mp, tc,
Word-medially rs, sk, fc, fk,
rks c, m, ss lts
xt,
n, n, sn, n, ts, rsn, lts,
Word-nally st, nt, ls, ks, n, tn, spr, skn, ndsn
ps, n, tr, rs vsn, ksn, kn
Before attempting to analyse (3), a word of caution is in order. This is a by

no means exclusive list of the clusters arising in SMG and KG. It just repre-
sents derived clusters that arose in our corpus of KG data. There are additional
underlying clusters in both KG, as well as SMG. With this in mind, numerous
observations can be made. First, the inventory of KG clusters is much richer
than that of SMG. This is extraordinarily so in the case of word-nal clusters
that SMG totally lacks. Second, word-initial clusters present all possible
sonority patterns that emerge according to the sonority hierarchy in (4).
(4) Sonority scale (>: more sonorous than. . .) [Gouskova 2004; Zec 2007]
Low Vs > Mid Vs > High Vs & Glides > Rhotics > Laterals >
Nasals > Voiced Fricatives > Voiced Stops > Voiceless Fricatives >
Voiceless Stops
In particular, we get sonority rises (pl, dv, zn, ), plateaus (pt, sx) and
even reversals (k, mk). Existence of plateaus and reversals is generally
deemed not ideal for onset clusters, but proves unproblematic if one endorses
Berent et al. (2007: 594) who state that: In any given language: (a) The
presence of a small sonority rise in the onset implies that of a large one. (b)
The presence of a sonority plateau in the onset implies that of some sonority
rise. (c) The presence of a sonority fall in the onset implies that of a plateau.
Word-medial biconsonantal clusters on the other hand, form for the most
part ne coda-onset sequences, in line with the Syllable Contact Law (Hooper
1976; Vennemann 1988; Baertsch 2002; Gouskova 2001, 2004), which asks
that sonority falls across syllable boundaries. Exceptions are: tk, ss (sonority
13. See the Appendix for words containing these clusters.

plateaus), as well as and m. The latter comprise a sonority rise, indicating

that they are instead complex onsets. This idea is corroborated by the fact that
and zm akin to m are also found word-initially as underlying clusters,
e.g. rno ay, zmnos ock. Longer clusters, e.g. str may also constitute
coda-onset sequences, containing either a complex onset (s.tr) or a complex
coda (st.r).14
The majority of word-nal clusters ends in /n/, followed by /s/ and then /r/
or /t/. Matters seem much more complicated here; assuming a syllabication
that only involves complex codas, then we should only anticipate falling or,
at worse, level-sonority codas. Instead, we also nd rises, e.g. tr, skn. How-
ever, it is well-known that sequences of word-nal consonants may commonly
appear as extraprosodic or extrasyllabic (Vaux and Wolfe 2009; Goad 2011),
thus escaping sonority considerations. Alternatively, and given that most of
the clusters end in n, it is also possible to pursue an account that views such
consonantal sonority-peaks as syllabic consonants.
The matter of fact is that we cannot at present offer a clearer picture of KG
syllabication, since this requires a number of resources we currently lack;
these include among others: syllable-counting perception experiments and, of
course, additional data. We believe this matter partly accounts for an issue
raised by a reviewer, namely the distribution of the consonants surrounding
the VD-undergoing vowel. Differences in that respect may relate to syllabica-
tion issues as well as token frequency which may skew the distribution, for
example the word [spit] from /spiti/ house occurs several times in our data,
therefore increasing the number of [t] tokens in C1 word nal position.
One thing is nonetheless certain; KG not only admits a richer inventory of
clusters in all positions (cf. (3)), but it also concedes a wider range of nal
singleton codas than SMG. (5) lists those found word-nally in both dialects.
It is evident that KG includes a much larger coda inventory. Besides [n, s, r]
which occur as underived codas in both dialects, they also appear as derived
ones after vowel deletion in KG, along with the remaining consonants below.
(5) Singleton codas in KG and SMG and representative examples
KG n, s, r, m, , t, c, ts, , z, f, , v
e.g. beamn dead-fem, javs read-3sg, zxar sugar, m mine,
all, spt house, sukc alley, ts of-fem-sg-gen, live-3sg,
rz rice, cf exuberance, l oil, nv bride
SMG n, s, r (very rarely)
14. On the special status of /s/ in clusters and various other possibilities of syllabica-
tion, see Goad (2011).
5. Typological observations
Gordons (1998) typological survey of VD compiles numerous properties of

devoiced/voiceless vowels, many of which are also attested in KG VD (e.g.
the gradience and usually allophonic nature of the phenomenon, token-by-
token variation, the strong preference to devoice high vowels, etc.). Gordon
additionally points out that voiceless vowels are usually favoured in particular
positions within the word, primarily word-nally and then adjacent to voice-
less consonants. To account for the attested patterns, he is inspired by work
by Dauer (1980) and Jun and Beckman (1993, 1994) and offers two rather dis-
tinct explanations of vowel devoicing depending on the position within the
word.
Word-nally, where devoicing is most predominant, the low subglottal pres-
sure characteristic of that position is held responsible. Word-medially how-
ever, a gestural overlap account is promoted instead, whereby unstressed high
Vs, intrinsically quite short in duration, are more susceptible to having their
glottal adduction gesture overlapped by the glottal abduction gestures of
neighbouring voiceless consonants. The gradience observed in this phenomenon
is captured by the extent of this overlap the more extensive the overlap, the
more complete the devoicing.
This split in accounts has a welcome result. The word-nal explanation
works regardless of voicing considerations and is independent of the word-
medial voicing explanation. Indeed, as the empirical facts reveal (both our
own and cross-linguistically), nal vowel devoicing disregards the voicing
of neighbouring consonants (see Table 1, rows e+f ). On the other hand, the
overlap account word-medially specically predicts that devoicing will more
likely occur when a vowel is anked by voiceless consonants both ways. It is
also corroborated cross-linguistically since VD is indeed more frequent if both
Cs are voiceless, followed by cases where C2 is voiceless and nally where C1
is voiceless (Gordon 1998: 98).
One nal possibility has not yet been discussed, namely the case where VD
takes place between voiced consonants. In Gordons (1998) typological study
of VD this pattern is omitted altogether, presumably because it never arises in
any of the 55 languages in his survey.15 Furthermore, to our knowledge, none
of the VD accounts that employ gestural overlap has addressed this possibility.
15. Dauers (1980) study on Standard Greek reports the same results regarding the dis-
tribution of VD, although she claims that instances where VD occurs after voice-
less C1 are somewhat more frequent than those where C2 is voiced. She also states
that reduction between voiced Cs happens but is highly rare, which is why she
totally disregards it in the ensuing discussion.
We propose however that VD of this type occurs and that gestural overlap
can extend to it too. In fact, 12% of KG VD occurs between voiced consonants,
e.g. /ua/ [] work, job, /duvarja/ [dvrja] walls, /maiula/
[mala] a female name (cf. Fig. 10).
Figure 10. Complete i-deletion between voiced Cs in the word [maiula] a female
name.
Recall that in this paper VD has been used as a cover term and does not
specically refer to vowel devoicing or vowel deletion. The latter two are just
a couple of the stages encompassed by the phenomenon in question. What we
predict then is that between voiced consonants all stages of VD should be able
to emerge, save one, vowel devoicing itself.16 This is because voiced con-
sonants have a similar type of glottal gesture as vowels. Thus, none of the
consonants can be associated with a devoicing gesture that could overlap into
the vowel. Consequently, VD, with the exception of the devoicing stage, may
occur.
Given the above, we hypothesise that word-medial VD as a phenomenon
may appear between all types of consonants in terms of voicing. However, its
possible realisations between voiced consonants form a subset of those emerg-
ing between other combinations of consonants. The hypothesised situation is
schematised in (6). At present, we lack a sufcient number of data that can
be adequately tested against such prediction; nonetheless, initial examination
of the data at hand, seem to support our proposal. We anticipate that future
work shall be able to offer a more conclusive answer.
16. Such devoicing does arise in KG as shown in Figure 11.

Figure 11. Final i-deletion in the word [spit], accompanied by aspiration and formant
structure, but no voice bar.
(6) Stages of VD in the environment C1V[high unstr.] C2

If C1,C2 = voi All stages are possible, but devoicing should be found
here with the highest frequency
If C1 = voi, C2 = +voi All stages are possible, but devoicing should appear less
or C1 = +voi, C2 = voi frequently
If C1,C2 = +voi All VD stages are possible with the exception of
devoicing
Moreover, we contend that the gestural overlap account (GOA) alone is not
sufcient to explain the full range of attested facts cross-linguistically. There
are numerous other traits that it leaves unaccounted for, which should be
further investigated. For example, GOA cannot explain why in Kozani Greek
VD is much more frequent when C2 is voiceless (row c) than when C1 is
(row b) (see Table 1 repeated here as Table 4), although the two patterns are
identical in the sense that both share the presence of a voi and a +voi con-
sonant (but in different linear order).
Table 4. VD frequency in different voicing environments. The last three columns show,
from left to right, frequency in word medial positions (% medial), in word nal
position (% nal), and in all positions considered together (Total %).
Pattern i U Total # % of medial % of nal Total %
a. voi VD voi 31 8 39 40.21 20.31
b. voi VD +voi 10 2 12 12.38 6.25
c. +voi VD voi 34 1 35 36.08 18.23
d. +voi VD +voi 6 5 11 11.34 5.73
e. voi VD# 44 11 55 57.9 28.65
f. +voi VD# 35 5 40 42.1 20.83
TOTAL 160 32 192 100
A possible explanation for this asymmetry is already hinted at in Figure 9.

Pattern c, i.e. +voi VD voi, most frequently arises word-medially, contrary
to pattern b, i.e. voi VD +voi, which is only marginally manifested in that
position. Results are more comparable for the word-nal position, whereas
word-initially c is found more often than b, but their difference is by no means
as dramatic as it was word-medially17.
The word-medial results can be understood if sonority is taken into con-
sideration18. Recall from (4) that sonorants are more sonorous than voiced
obstruents which in turn are more sonorous than voiceless obstruents. As
shown before, VD at least when interpreted as full elision creates a con-
sonant cluster. We propose that the reason VD occurs more often when it
is to create C[+voi]C[voi] rather than C[voi]C[+voi] sequences is because such
strings are analysed heterosyllabically and only the former offer good Syllable
Contact (see Gouskova 2004),19 that is, the transition from coda to onset is
one of falling sonority, rather than rising.
17. Only the left panel of Fig. 9 is used here, since the right panel contains too few
data to allow us any claim. Also, reference is solely made to the word-medial posi-
tion, since it is the one that shows the most systematic effects.
18. Thanks to Lasse Bombien for suggesting this line of thought to us.
19. The C[voi] C[+voi] string implies either the sequence T-S or T-D where S = sonorant,
T = voiceless obstruent, D = voiced obstruent. In a heterosyllabic analysis, both are
ill-formed in terms of Syllable Contact, however, we cannot rule out the possibility
of a tautosyllabic analysis in terms of complex onsets, e.g. TS. Such cluster would
be well-formed, but TD would not (for reasons having to do with consonant
phonotactics in Greek). At present, we assume that heterosyllabic syllabication
is preferred over tautosyllabic one for derived consonant clusters, a matter that
requires further investigation though.
Besides arguing that sonority is also at play in VD, we further speculate

that its role is subordinate to that of GOA. Such an idea stems from the fact
that in the current data pattern d is slightly less common than b. Based on
the speculation outlined next, we believe that this difference will be bigger
in a larger set of data. Evaluating the patterns of VD in Table 4 in terms of
declining well-formedness, we get the order a > b = c > d for GOA and
c > a = d > b for Syllable Contact preferences. Matching these to our fre-
quency results, it must be that GOA is more important than Syllable Contact
so that a is more widespread than c. Pattern c then follows, since it is next
to best in GOA terms, but perfect in terms of sonority. This leaves us with b
and d. Pressures here are conicting; b > d for GOA, but d > b for Syllable
Contact. If our reasoning is correct, then pattern b should be more frequent
than d. The present results are compatible with this prediction, but are by no
means conclusive. Perhaps, examination of additional data will shed light on
this issue.
All in all, the frequency of VD as regulated by the anking consonants
largely seems to be a matter of GOA and sonority considerations. By the
same token, GOA alone is incapable of accounting for differences in the
nature of VD. In prototypical VD, only voiceless consonants drive VD (Gordon
1998), whereas in KG their voicing value seems to be irrelevant. We can thus
perhaps make recourse to the spreading of [voi] (cf. McCawley 1968 and
Teshigawara 2002 for Japanese) vs. the spreading of [voi], respectively.
Alternatively, we can say that traditional VD is more phonological in that it
involves [voi] spreading, whereas KG VD is more phonetic, in that it reects
a more general reex of gestural overlap regardless of the phonological speci-
cation of voicing.
Whatever the answer to this issue, it is clear that GOA is quite successful in
capturing the gradience of VD (unlike any purely phonological account of
[voi] spreading), but it too fails to be entirely accurate. If GOA is right for
KG, then why doesnt VD between voiced consonants appear more frequently
than it does? While we wouldnt expect devoicing to occur for reasons
explained above other realisations of VD, especially elision itself, should
be able to emerge, and yet they only do so limitedly. Even more puzzlingly,
Shiraishi (2003) nds that VD in Ainu applies to high vowels between voice-
less consonants. However, the vowels in the syllables pi, pu, and tu never
appeared devoiced or deleted in any of the 120 instances found in his record-
ings, even though they occur in the right devoicing environment. Consequently,
GOA needs to be complemented by additional considerations, both phonolog-
ical as well as language-specic.
6. Conclusion
In this paper, we have offered an acoustic analysis of Vowel Deletion / Devoicing

in a dialect of Greek. We have principally investigated the environment, the
acoustic correlates, the various realisation stages and the vowel quality differ-
ences in the application of VD. In doing so, we have examined the nature of
the consonantal clusters in Kozani Greek and have found that a wider inven-
tory than that of Standard Greek emerges. The range of codas is likewise
much richer.
Beyond the descriptive goals of the paper, we have also seen that the Kozani
Greek data are of much empirical value and have theoretical implications for
the typology of VD. The presence of VD between voiced consonants is
extremely rare and so far had been left unaccounted for by gestural overlap
theories of VD. However, we have tentatively argued that gestural overlap
can extend to this case too and have hypothesised its specic effects awaiting
empirical conrmation.
References
Arvaniti, Amalia
1994 Acoustic features of Greek rhythmic structure. Journal of Phonetics
22: 239268.
Arvaniti, Amalia
1999 Illustrations of the IPA: Standard Greek. Journal of the International
Phonetic Association 29: 167172.
Arvaniti, Amalia
2001 Comparing the Phonetics of Single and Geminate Consonants in
Cypriot and Standard Greek. Proceedings of the Fourth Interna-
tional Conference on Greek Linguistics, 3744. Thessaloniki: Uni-
versity Studio Press.
Baertsch, Karen
2002 An optimality-theoretic approach to syllable structure: the split
margin hierarchy. Ph.D. dissertation, University of Indiana.
Baltazani, Mary
2006 Focusing, prosodic phrasing, and hiatus resolution in Greek. In Luis
Goldstein, Douglas Whalen, Catherine Best (eds.), Laboratory
Phonology 8, 473494. Berlin/New York: Mouton de Gruyter.
Baltazani Mary
2007a Prosodic rhythm and the status of vowel reduction in Greek. In
Selected Papers on Theoretical and Applied Linguistics from 17th
International Symposium on Theoretical and Applied Linguistics,
3143. Thessaloniki: Monochromia.
Baltazani, Mary
2007b The effect of prosodic boundaries on syllable duration in Greek.
Paper presented in Old World Conference in Phonology 4, Rhodes,
1821 January 2007.
Berent, Iris, Donca Steriade, Tracy Lennertz and Vered Vaknin
2007 What we know about what we have never heard: Evidence from
perceptual illusions. Cognition 104: 591630.
2009 Praat: doing phonetics by computer. Computer program; available
at: http://www.praat.org/.
Browning, Robert
1991 Medieval and Modern Greek [
]. 1st edition 1962, 2nd edition 1983; Greek edition 1991. Athens:
Papadima Publications.
Chatzidakis, Georgios
1905 Medieval and Modern Greek A' [ '].
Athens: P.D. Sakellarios.
Chitoran, Ioana and Ayten Babaliyeva
2007 An acoustic description of high vowel syncope in Lezgian. Proceed-
ings of the 16th International Congress of Phonetic Sciences, 2153
2156. Saarbrcken, Germany.
Chitoran, Ioana and Jos I. Hualde
2007 From hiatus to diphthong: The evolution of vowel sequences in
Romance. Phonology 24(1): 3775.
Dauer, Rebecca
1980 The reduction of unstressed high vowels in Modern Greek. Journal
of the International Phonetics Association 10: 1727.
Delforge, Ann Marie
2008 Unstressed vowel reduction in Andean Spanish. In Laura Colantoni
and Jeffrey Steele (eds.), Selected Proceedings of the 3rd Confer-
ence on Laboratory Approaches to Spanish Phonology, 107124.
Somerville, MA: Cascadilla Proceedings Project.
Eftychiou, Eftychia
2008 Lenition processes in Cypriot Greek. Ph.D. dissertation, University
of Cambridge.
Fourakis, Marios
1986 An acoustic study of the effects of tempo and stress on segmental
intervals in Modern Greek. Phonetica 43:172188.
Fourakis, Marios, Antonis Botinis and Maria Katsaiti
1999 Acoustic characteristics of Greek vowels. Phonetica 56: 2843.
Gafos, Adamantios and Angela Ralli
2001 Morphosyntactic features and paradigmatic uniformity in two dialects
of Lesvos. Journal of Greek Linguistics 2: 4173.
Goad, Heather
2011 The representation of sC clusters. In Marc van Oostendorp, Colin
Ewen, Beth Hume and Keren Rice (eds.), The Blackwell Companion
to Phonology, vol. II, chapter 38. Oxford: Wiley-Blackwell.
Gordon, Matthew
1998 The phonetics and phonology of non-modal vowels: a cross-linguistic
perspective. Berkeley Linguistics Society 24: 93105. [Online at:
http://www.linguistics.ucsb.edu/faculty/gordon/Nonmodal.pdf; accessed
28 July 2011].
Gouskova, Maria
2001 Falling sonority onsets, loanwords, and Syllable Contact. In Mary
Andronis, Christopher Ball, Heidi Elston and Sylvain Neuvel
(eds.), CLS 37: The Main Session. Papers from the 37th Meeting of
the Chicago Linguistic Society. Vol. 1, 175185. Chicago, IL: CLS.
Gouskova, Maria
2003 Deriving economy: syncope in Optimality Theory. Ph.D. disserta-
tion. University of Massachusetts, Amherst.
Gouskova, Maria
2004 Relational hierarchies in OT: the case of syllable contact. Phonology
21(2): 201250.
Han, Mieko Shimizu
1962 Unvoicing of vowels in Japanese. Onsei no Kenkyuu 10: 81100.
Hooper [Bybee], Joan
1976 An Introduction to Natural Generative Phonology. New York: Aca-
demic Press.
Jannedy, Stefanie
1995 Gestural phasing as an explanation for vowel devoicing in Turkish.
OSU Working Papers in Linguistics 45: 5684.
Jun, Sun-Ah and Mary Beckman
1993 A gestural-overlap analysis of vowel devoicing in Japanese and
Korean. Paper presented at the 67th Annual Meeting of the Linguistic
Society of America. Los Angeles, CA.
Jun, Sun-Ah and Mary Beckman
1994 Distribution of devoiced high vowels in Korean. Proceedings of the
1994 International Conference on Spoken Language Processing,
vol. 2, 479482.
Kondo, Mariko
1994 Is vowel devoicing part of the vowel weakening process? In Pro-
ceedings of the Edinburgh Linguistics Department Conference
1994, 5562. [Online at: http://citeseerx.ist.psu.edu/viewdoc/sum-
mary?doi=10.1.1.50.8089; accessed 28 July 2011].
Kondosopoulos, Nikolaos
2000 Dialects and Idioms of Modern Greek [
]. 3rd edition. Athens: Gregori Publications.
Kouziakis, Lazaros
2008 Ive heard, Ive been told and Ive written [', '
]. Kozani.
Loukina, Anastassia
2008 Regional phonetic variation in Modern Greek. Ph.D. dissertation,
University of Oxford.
Maekawa, Kikuo
1983 On Vowel Devoicing in Standard Japanese [Kyootsuugo-ni Okeru
Boin-no Museika-ni Tsuite]. Gengo-no Sekai 1: 6981.
McCawley, John D.
1968 The Phonological Component of a Grammar of Japanese. The Hague:
Mouton.
Mo, Yoonsook
2007 Temporal, spectral evidence of devoiced vowels in Korean. In Pro-
445448. Saarbrcken, Germany. [online at: http://www.icphs2007.
de/conference/Papers/1597/1597.pdf; accessed 28 July 2011].
Newton, Brian
1972 The Generative Interpretation of Dialect: A Study of Modern Greek
Phonology. Cambridge: Cambridge University Press.
Nicolaidis, Katerina
2001 An electropalatographic study of Greek spontaneous speech. Journal
of the International Phonetic Association 31: 6785.
Nicolaidis, Katerina
2003 Acoustic variability of vowels in Greek spontaneous speech. Pro-
32213224. Barcelona, Spain.
Papadopoulos, Anthimos
1927 Grammar of Modern Greek Northern Idioms [
]. Athens: P.D. Sakellarios.
Protopapas, Athanassios, Marina Tzakosta, Aimilios Chalamandaris and Pirros Tsiakoulis
2010 IPLR: An online resource for Greek word-level and sublexical in-
formation. Language Resources and Evaluation, Online First, 2
September 2010. [online at: http://users.uoa.gr/~aprotopapas/CV/
pdf/Protopapas_etal_LRE-IPLR.pdf; accessed 28 July 2011].
Shiraishi, Hidetoshi
2003 Vowel devoicing of Ainu: How it differs and not differs from vowel
devoicing of Japanese. In T. Honma, M. Okazaki, T. Tabata and S.
Tanaka (eds.), A New Century of Phonology and Phonological
Theory, A Festschrift for Professor Shosuke Haraguchi on the Occa-
sion of His Sixtieth Birthday, 237249. Tokyo: Kaitakusha.
Teshigawara, Mihoko
2002 Vowel Devoicing in Tokyo Japanese. In G.S. Morrison & L. Zsoldos
(eds.) Proceedings of the North West Linguistics Conference 2002,
4965. Burnaby, BC, Canada: Simon Fraser University Linguistics
Graduate Student Association.
Trudgill, Peter
2003 Modern Greek dialects: a preliminary classication. Journal of
Greek Linguistics 4: 4564.
Tsuchida, Ayako
2001 Japanese vowel devoicing: cases of consecutive devoicing environ-
ments. Journal of East Asian Linguistics 10: 225245.
Turk, Alice and White, Lawrence
1999 Structural effects on accentual lengthening in English. Journal of
Vaux, Bert and Andrew Wolfe
2009 The appendix. In Eric Raimy and Charles Cairns (eds.), Contem-
porary Views on Architecture and Representations in Phonology,
101143. Cambridge, MA: MIT Press.
Vennemann, Theo
1988 Preference laws for syllable structure and the exploration of sound
change. Berlin: Mouton.
Zec, Draga
2007 The syllable. In Paul de Lacy (ed.), The Cambridge Handbook of
Phonology, 161194. Cambridge: Cambridge University Press.
Appendix
The following table gives examples of words illustrating clusters occurring in

KG which are illicit in SMG. The columns show the cluster (column a), the
word as it is pronounced in KG (b), as it is pronounced in SMG (c) and its
gloss (d).
a. Cluster b. Word in KG c. Word in SMG d. Gloss

WORD INITIAL
dv [dvarja] [duvrja] walls
bs [bso] [mis] half
[a] [u] work
k [kos] [iks] my own
zn [zn] [stin] at the
mk [mkros] [mikrs] small
k [kkan] [sikikan] rose 3-pl
WORD MEDIAL
zt [terzti] [terzete] t 2-pl
n [itnes] [itnises] neighbours
m [memr] [mesimri] noon
k [kami] [ekame] tied ourselves
c [apulcin] [apolice] was released 3-sing
mp [amps] [anipsjs] nephew
WORD FINAL
n [in] [e] had 3-sing
n [pln] [pulse] sold 3-sing
Ls [bakls] [baklis] grocer
ts [pults] [poltis] civilian
rs [tsers] [tseris] four
rsn [rsn] [rise] came back 3-sing
lts [vanlts] [vanlis] Vangelis (male name)
tn [krtn] [krtise] kept 3-sing
skn [tirjscn] [terjstice] t 3-sing
List of contributors
Editors
Philip Hoole Marianne Pouplier
Institute of Phonetics and Speech Institute of Phonetics and Speech
Processing, Ludwig-Maximilians- Processing, Ludwig-Maximilians-
Universitt, Munich Universitt, Munich
Lasse Bombien Christine Mooshammer
Institute of Phonetics and Speech Haskins Laboratories, New Haven
Processing, Ludwig-Maximilians-
Universitt, Munich Barbara Khnert
Institut du Monde Anglophone &
Laboratoire de Phontique et Phonologie,
CNRS/Sorbonne-Nouvelle, Paris
Contributors
Mary Baltazani Martine Grice
Department of Linguistics, University of IfL Phonetik, University of Cologne
Ioannina
Claire Halpert
Marie-Anne Barthez Department of Linguistics and
Language Reference Center, Clocheville Philosophy, MIT, Cambridge, MA
Hospital, Tours Regional University
Hospital Center, Tours Anne Hermes
IfL Phonetik, University of Cologne
Pia Bergmann
Fang Hu
Deutsches Seminar: Germanistische
Institute of Linguistics, Chinese Academy
Linguistik, University of Freiburg
of Social Sciences, Beijing
Department of Linguistics, University Rina Kreitman
Columbia University, New York
of Potsdam, and Utrecht Institute of
Linguistics, Utrecht University Yasutomo Kuwana
Asahikawa Jitsugyo High School,
Sandrine Ferr
Asahikawa
INSERM, U930, Tours, and Universit
Franois-Rabelais de Tours, CHRU de Stefania Marin
Tours, UMR-S930, Tours Institute of Phonetics and Speech
Louis Goldstein Processing, Ludwig-Maximilians-
Universitt, Munich
University of Southern California and
Haskins Laboratories, New Haven
400 List of contributors
Doris Mcke Laurice Tuller

IfL Phonetik, University of Cologne INSERM, U930, Tours, and Universit
Franois-Rabelais de Tours, CHRU de
Hosung Nam Tours, UMR-S930, Tours
Haskins Laboratories, New Haven
Marina Tzakosta
Henrik Niemann
Faculty of Education, University of Crete
IfL Phonetik, University of Cologne
Zsuzsa Vrnai
Eva Sizaret
Research Institute for Linguistics,
Language Reference Center, Clocheville Hungarian Academy of Sciences,
Hospital, Tours Regional University
Budapest
Hospital Center, Tours
Theo Vennemann
Hisao Tokizaki
Institut fr deutsche Philologie, Ludwig-
Department of English, Sapporo
Maximilians-Universitt, Munich
University
Nina Topintzi
Institut fr Linguistik, University of
Leipzig
Subject index
accentuation 311, 313, 316, 320, 322 bilingual 122, 134135, 143, 280, 360
327, 329, 338339, 341 blending weight 177, 182184, 186,
acquisition 24, 90, 99, 115, 174175, 188190, 192194, 197, 199
285286, 291, 293294, 297, 301, borrowing 11, 51, 120, 134135
306308 calibration 28
lexical acquisition 257263, 269, case suxes 84
275278, 281
phonological acquisition 116, 280 C-center 160, 165, 171, 173, 175, 208,
anaptyxis 95, 135, 142 210211, 227, 231, 233, 245, 247
contact anaptyxis 28 249
alignment 206207, 212, 222, 224, cluster
232, 247249, 253, 345, 355, 358 coda cluster 13, 23, 136, 140
peak alignment 211, 228 contact cluster 14, 128, 130131,
articulatory alignment 215, 217 133, 136, 138139
220 derived cluster 385386
acoustic alignment 215216, 221 head cluster 1316, 18, 20, 23
tonal alignment 205, 214, 221, 226 intersyllabic cluster 1314, 23, 138
227, 229, 252 onset cluster 3336, 38, 4142, 46,
anti-phase, see coupling 50, 6364, 113, 138139, 142
apheresis, see copation 143, 386
apocope, see copation vowel cluster 177, 198199
articulatory coordination 157159, cluster formation 93, 9596, 98
170171 101, 107109, 112, 116, 370371
articulatory phonology 29, 157, 159, cluster well-formedness 93, 9596,
174, 181, 198, 200, 202, 205, 207, 105, 109
214, 226, 232, 250, 341, 345, 354, perfect, acceptable, non-acceptable
356, 366367 clusters 9596, 99107, 109112
articulatory retina 12 coalescence 8283
aspiration 38, 55, 63, 235, 240241, coda 1315, 18, 2324, 27, 7374, 79,
249, 348, 357360, 363, 369370, 83, 91, 94, 96, 107, 111, 119, 126,
372373, 375376, 378, 390 128129, 131133, 135136, 140
assimilation 28, 58, 65, 104105, 112 141, 143146, 148, 157159, 174,
114, 139, 143, 311, 313315, 318, 177, 180, 202, 208209, 228, 235,
320, 322, 324, 329, 337343, 345 261264, 270, 286, 289, 306, 339,
348, 350353, 355, 357358, 360, 369, 377, 386, 391, 393
363368 coda weakening 28
velar nasal assimilation 312 coda inventory 71, 7778, 8081,
asymmetry 3941, 8588, 91, 225, 85, 88, 371, 387
229, 252, 368370, 380384, 391 Coda Law, see preference laws
autosegmental-metrical approach 206, compensatory strategies 301, 303304
214 complement-head order 71, 79, 89
402 Subject index
complex onset 68, 133, 136, 138, 157, Early Syllable Law, see preference
160, 165, 167, 169, 171, 173174, laws
202, 208, 228, 261, 263, 387, 391 electropalatography (EPG) 226, 311,
Consonantal Strength 12, 16, 18, 23, 313, 315, 330, 337, 342343
2728, 95 electromagnetic articulography
consonantality 1213 (EMA) 205, 231, 234235, 250,
constraints of association 286 312, 342
Contact Law, see preference laws epenthesis 37, 63, 82, 84, 95, 106,
copation 135137, 139146, 148, 150, 152
apocope 17, 2324 contact epenthesis 28
procope (apheresis) 24
syncope 11, 17, 23, 51, 135, 138, feature
140141, 143, 146, 394395 feature geometry 99
coupled oscillator 174175, 182, 202, feature [sonorant], see sonorant
205, 210, 227, 229, 252 feature [voice], see voice
coupling 174, 181, 184, 202, 207, 227, laryngeal feature 45, 6364, 114,
248, 252, 368 345346, 351, 355, 357, 364
coupling hypothesis of syllable First Syllable Law, see preference laws
structure 208 frequency
coupling mode word frequency 311, 313, 341, 343
intrinsic mode 159, 208 high-frequency 257, 259, 311318,
in-phase 157, 159160, 171, 177, 320325, 327, 329, 331333, 337
179180, 182183, 185189, 192, 338, 340341
194, 197198, 205, 208210, 222 low-frequency 257, 262, 270, 283,
224, 233, 249 311315, 317318, 320323, 327,
anti-phase 157, 159160, 171, 329, 332333, 352
177, 179, 183, 185, 208210, 223 fricative 1213, 16, 18, 22, 34, 4041,
224, 233, 249 94, 97, 99102, 108, 125, 127
coupling graph 161, 172, 205206, 131, 141, 214, 232, 235, 287, 290,
208211, 223226 293294, 348, 355357, 361, 364,
competitive coupling 158, 160, 170 373374, 377379, 386
171, 173, 206, 209210, 224225,
228, 231, 249 gemination 28
General Syllabication Law, see
deletion 8283, 95, 106, 135, 137145, preference laws
147149, 264, 311, 313, 318319, geographical gradation 71, 7778, 85
321324, 328329, 331334, 337 gesture 12, 29, 64, 157158, 165, 181,
341, 352, 364, 369370, 372385, 183186, 188, 194, 197198, 202,
387, 389390, 393 208, 216, 218, 229, 232, 252, 318,
diphthong 74, 126, 177181, 183184, 354, 361362
186, 188195, 197202, 314315, gestural coordination 159, 173174,
339340, 394 200, 226227, 233234, 241, 247,
dissimilation 45 250251, 366367
duration increase 377378 gestural model 177
Subject index 403
gestural overlap 59, 345347, 353, metathesis 25, 29, 66, 135, 138, 140
364, 366369, 388390, 392393, 143, 146, 149150, 301, 303
395 at a distance 11, 22
intergestural timing 205, 207, 209 contact metathesis 28
210, 227, 231, 241245, 248249 slope metathesis 2223
oral constriction gesture 205206,
209210, 214215, 219, 223, 347, naturalness 69, 270, 276
353, 355358, 360, 363365 graded naturalness 15, 22
spatial modulation gesture 182, 193, n-Insertion 86
199 nuclear rise 205206, 211212
tone gesture 205206, 209211, onset, see complex onset, onset
214215, 217, 219220, 222225, cluster, syllable onset
227, 231, 233, 236, 238, 245, 248 nucleus, see syllable nucleus
249, 251 Nucleus Law, see preference laws
gradience 71, 88, 95, 107109, 258,
370, 388, 392 oral constriction gesture, see gesture
head cluster, see cluster phonological acquisition, see acquisi-

head-complement order 7881, 85, 88 tion
head-nal language 79, 82, 84, 88 phonological complexity 90, 202, 229,
Head Law, see preference laws 252, 285286, 290293, 297, 304
head strengthening 28 307
heterosyllabic 20, 96, 109, 112, 124 phonological word boundary 313
125, 391 phonotactics 2122, 27, 30, 5859, 65,
hiatus 177180, 183, 186, 188, 190, 68, 9394, 9697, 99, 112, 115,
198, 200, 349, 381, 393394 124125, 128129, 131, 153, 178,
264, 269, 278280, 287, 294, 360,
implicational universal 18, 29, 65, 72 391
impure s 157158, 169, 171, 173 probabilistic phonotactics 257263,
in-phase, see coupling 274277, 281282
interxation 87, 90 phrasing 87, 393
place scale 93, 102103, 105, 107, 109,
juncture 7172, 85, 8789, 174, 200 111112
planning oscillator 180, 183, 206
left-branching compound 87 plosive 1213, 1518, 125131, 137
lexical acquisition, see aquisition 138, 141142, 348
loanwords 1718, 23, 119120, 123, preference 11, 1416, 18, 2122, 30,
134135, 137, 140, 143, 151152, 107, 117, 153, 314, 372, 388, 392,
395 397
for brevity 24
manner scale 101102, 105106 preference law 11, 30, 117, 153, 397
markedness 29, 3334, 4041, 65, 99, Coda Law 1415, 27
261263, 285, 291292, 306, 346 Contact Law 15, 2728, 124, 132,
347, 352353, 355358, 363364, 386
367 Early Syllable Law 25, 2728
404 Subject index
First Syllable Law 2528 sonority scale 34, 40, 9394, 112,
General Syllabication Law 25, 28 386
Head Law 1416, 20, 27 sonority hierarchy 58, 66, 114, 124,
Nucleus Law 27 143, 386
Stressed Syllable Law 2526, 28 sonority sequencing principle
prependix 18 (SSP) 3637, 3940, 66, 124125,
procope, see copation 262
prosthesis, prothesis 20, 135, 137, 142, speaker variation 314, 340
144, 147148 Specic Language Impairment
(SLI) 285, 291308
repair 95, 106, 112, 121, 134135, strength assimilation 28
137143 strengthening 20, 28, 181, 199, 201,
rhythmic grid 288289, 293 334, 341342, 382
right-branching structure 72, 8586, stress 28, 61, 65, 88, 93, 113, 115, 177,
8889 179, 181, 183, 197, 200203, 287,
right-branching compound 87 291292, 314, 382, 394
modeling stress 182, 184, 193194
scale 13, 1519, 34, 40, 74, 9396, 99, stress-conditioned alternation 178,
101107, 109, 111112, 123, 221 180, 184, 193, 199
223, 306, 320321, 323329, 337 structural complexity 1415, 2324,
341, 386 232, 250, 262263, 275, 277
Consonantal Strength scale 12 sub-lexical level 259261, 276277
sequential voicing 85 substitution 103, 115, 135, 139141,
short-term memory 257258, 275, 143144, 146147, 149, 301303,
278282, 292 305
singleton 262264, 360, 362, 377, 387 syllabic parsing 157
simplication 25, 71, 77, 82, 85, 88, Syllabic Structure 293295
120, 130, 133, 135, 158, 231232, syllabication 59, 68, 93, 9697, 112
234, 248249, 304 113, 115, 150, 158, 173, 175, 229,
articulatory simplication 19 288, 290, 300, 305, 349, 387, 391
slope 11, 16, 2223, 195 syllable
slope consonant 20 syllable boundary 13, 124, 126, 133,
slope displacement 2425 138, 142
sonorant 13, 1516, 35, 4243, 45, 47, syllable coda 1315, 24, 27, 159,
5355, 125128, 130132, 134, 208209, 261, 263
141142, 234235, 240, 248, 287, syllable complexity 7374, 7780,
289290, 293, 300301, 377379, 85, 8890, 132, 257, 268, 274, 277
391 syllable contact 27, 124, 126128,
feature [sonorant] 3334, 3941, 44, 130134, 136, 138, 386, 391392,
4952 395
sonority 1213, 33, 4142, 4951, 59, syllable contact change 21, 28, 30
69, 9596, 99, 113, 126128, 130 syllable head 1216, 18, 2324, 27
131, 134, 150, 160, 288290, 387, syllable margin 12, 257, 261262
391392, 395 syllable nucleus 12, 27, 42
Subject index 405
syllable onset 61, 157160, 171, typology 29, 33, 41, 46, 5051, 6061,
174, 207209, 261, 263, 312 63, 65, 69, 7374, 8991, 111,
syllable organization 113, 179 119, 133, 252, 261, 287, 293, 307,
syllable simplication 82 345, 347, 363, 369, 371, 393
syllable structure 11, 15, 24, 30, 61,
65, 69, 7178, 8185, 88, 90, 113, voice 4243, 5354, 63, 97, 104, 112
117, 119, 121, 124, 133, 135, 151, 113, 240, 359, 372, 379, 388, 390,
153, 157160, 174175, 180, 200, 392
202, 208, 211213, 216, 218221, feature [voice] 3334, 41, 4452, 55,
225227, 229, 231233, 241, 250, 86
252253, 261, 263, 265274, 285, voicing scale 103105, 107, 109, 111
293, 301, 305307, 314, 339341, 112
368, 393, 397 vowel
naked syllable 24 vowel deletion 323, 369373, 375
syncope see copation 385, 387393
vowel elision 370
task-dynamics 177, 179, 203, 233 vowel devoicing 370, 388389, 395
TADA (TAsk-Dynamics Applica- 397
tion) 181182, 184, 194, 202, 223,
229 weak bracket 88
tautosyllabic 28, 93, 100, 109, 124, weakening 18, 20, 28, 313314, 331
137, 391 332, 346, 395
temporal coordination 205206 word
three-scales model 96, 105, 109 word-nal 23, 77, 8586, 151, 286,
tone 7879, 207, 209, 211212, 214 288, 306307, 315, 376377, 380
220, 222223, 227229, 233, 237 381, 386388, 391, 398
241, 245, 247248, 250253 word-initial 11, 1415, 18, 3338,
tone language 231232, 234, 236, 4042, 44, 4649, 5152, 64, 83,
249 86, 137, 139, 142, 157158, 160,
lexical tone 205206, 210, 224225 162, 171, 173174, 226227, 286,
prosodic tone 205 298, 307, 315, 383384, 386387,
tone gesture, see gesture 391, 398
tonal alignment, see alignment word-medial 37, 9697, 120, 124,
tonogenesis 231234, 240, 249252 128, 135, 372, 377, 380, 386, 388
timing 29, 157161, 167, 169, 171, 389, 391, 398
173, 180, 183, 199200, 205, 207, word stress 181, 193, 199
209210, 216, 222, 224227, 229, word frequency, see frequency
231, 233, 241245, 248249, 252,
342, 345, 349, 355, 363, 368
Language index
See also Appendix I and the Language Database of the chapter by Rina Kreitman
(pp. 5358)
Ainu 373, 392, 396 Dutch 47, 53, 56, 59, 69, 74, 85, 87,
Amuesha 4748, 53, 56, 61 90, 100, 107, 113, 116, 158, 198,
Arabic 56, 278 206207, 213, 221, 228, 257, 261
Moroccan Arabic 48, 60, 68, 157, 264, 270
160, 175, 229
Athapascan 79 Eggon 36, 56, 65
Avar 80 Egyptian 72
English 11, 1314, 16, 18, 2021, 29,
Babungo 38, 56, 67 6263, 7374, 8283, 112113,
Baltic 57, 79 116, 151, 157158, 160161, 173
Bantoid 79 174, 198, 201202, 206207, 209,
Bantu 221, 226228, 251, 261, 278, 286,
Kinyarwanda 363, 367 291293, 313315, 341342, 360,
Pokomo 358, 363, 367 366368, 382, 397
Sukuma 363, 367 Contemporary English 15, 17
Zulu 345360, 363366 Middle English 17
Basque 17, 30, 39, 53, 56, 63, 82, Old English 15
90
Berber 48, 56, 60, 157, 160161, 174, Fijian 73, 151
227 French 56, 60, 67, 198, 250, 285290,
Bilaan 4344, 4748, 53, 56, 60 292295, 302, 306307
Biloxi 4748, 53, 56, 61
Gansu 7879
Camsa 4748, 53, 56, 62 Georgian 3839, 4748, 53, 56, 59,
Carib 37, 5657, 62 62, 66, 157, 160, 174, 227, 346,
Catalan 205207, 211218, 220226, 366
229, 249, 306 German 11, 14, 1617, 20, 30, 4445,
Chatino 38, 53, 5556, 65 53, 5556, 6364, 69, 74, 85, 90,
Chinese 78, 89, 211, 231, 250, 252 113, 153, 174, 202, 211, 226, 228,
253 249, 286, 311313, 318, 338, 341
Mandarin Chinese 210, 227, 233, 343
236, 248249, 251 Contemporary German 15
Chukchee, Chukchi 37, 56, 5859, 63, Upper German 21
68, 81 Viennese German 205207, 212
Comanche 38, 56, 67 215, 218225
Standard German 315
Dakota 37, 56, 65 Germanic 14, 20, 2931, 5658, 63,
Darai 73 104, 151, 153
408 Language index
Greek 18, 30, 39, 47, 53, 56, 61, 63, Klamath 44, 5455, 5758
68, 9395, 9798, 104, 106107, Korean 14, 17, 38, 8087, 90, 151,
111113, 115117, 206207, 221, 227, 367368, 373, 395396
226, 375, 381, 391, 394395, 397 Kurdish 81
Classical Greek 16, 20 Kutenai 39, 47, 54, 57, 61
Contemporary Greek 16
Standard Greek 96, 99, 369370, Latin 11, 1718, 30, 8283, 153
385, 388, 393 Lezgian 81, 373, 394
Northern Greek 369370, 383
Kozani Greek (KG) 100, 105, 114, Mazatec 38, 57
369373, 377, 379, 383390, 392 Manchu 7879
393, 396, 398 Mba 73
Greenlandic 57 Mixtecan 79
West Greenlandic 51, 61 Moghol (Mongolic) 81, 84
Guanzhou 7879
Nambiqara 82, 90
Hawaiian 73 Nanshang 7879
Hebrew 53, 57 Ngandi 74
Biblical Hebrew 21 Nisqually 36
Modern Hebrew 39, 4446, 48, 52, Nivkh 84, 91
5859, 6364
Hindi 53, 57, 62, 66, 82 Otomi (Temoayan)
Hua 41, 48, 54, 57, 62
Pali 14, 21
Igbo 73, 81 Pashto 39, 54, 57, 66
Ijo 7172 Persian 36, 82, 91
Irish 39, 44, 5455, 57, 6061 Phoenician 21, 29
Italian 17, 20, 30, 112, 153, 157159, Polish 17, 30, 38, 54, 57, 62, 67
161, 167, 170174, 198, 202, 207, Popoluca 51, 54, 57, 61
221, 227, 250, 289, 292 Portuguese 11, 1720, 382
Old Italian 11, 25
Calabria 26 Romance 17, 5657, 200, 202, 306,
Lombardy 25 394
Lucania 25 Romanian 47, 54, 57, 100, 107, 177
Campania 25 179, 181184, 189, 193194, 199
Milanese 25 200, 202, 382
Tuscan dialects 26 Russian 39, 41, 52, 5455, 5758, 64,
Sicilian 26 6768, 82, 119122, 124, 134
138, 142146, 148, 150152
Japanese 74, 78, 80, 8283, 8587, 90, Rutul 81
279, 373, 384, 392, 395397
Samoyedic 135, 141
Kannada 82 Nenets 119123, 125129, 131
Kanuri 80 134, 136137, 139140, 142144,
Khasi 4345, 4748, 54, 57, 62, 6667 151152
Language index 409
Enets 119123, 127129, 131134, Thai 63, 7879

136140, 143, 145, 150, 152 Tibetan 56, 232, 235, 250, 252
Nganasan 119123, 129134, 136 Old Tibetan 234, 248
137, 139140, 146, 150153 Lhasa Tibetan 231, 234, 238, 240
Selkup 119120, 122123, 130134, 241, 249, 251, 253
136140, 142, 148, 150152 Amdo Tibetan (Xiahe dialect) 248
Scandinavian Tsou 4445, 4748, 55, 57, 63, 69
Old Norse 20 Turkic 79
Scandinavian dialects 20 Turkish 116, 373, 384, 395
Semitic 48, 5758, 79
Spanish 11, 17, 19, 2122, 25, 29, 51, Vietnamese 231, 249
55, 57, 59, 64, 67, 207, 221, 229,
382, 394 Wa 39, 55, 57, 69
Cusco Spanish 373 Warao 7172
Swedish 20, 55, 57, 68 Welsh 44, 55, 58, 68
Taba 38, 55, 57, 59, 62 Yareba 7173

Tahitian 17
Tamil 8081 Zulu, see Bantu

Consonant Clusters and Structural Complexity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Consonant Clusters and Structural Complexity

Uploaded by

Copyright:

Available Formats

Consonant Clusters and Structural Complexity

Library of Congress Cataloging-in-Publication Data

Bibliographic information published by the Deutsche Nationalbibliothek

2012 Walter de Gruyter GmbH & Co. KG, 10785 Berlin/Boston

Part I. Phonology and Typology

Part II. Production: analysis and models

Part III. Acquisition

Acquiring and avoiding phonological complexity in SLI vs. typical

Part IV. Assimilation and reduction in connected speech

List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

This book is a selection of papers from a meeting we organized in Munich in

consonant sequences with no intervening syllable or morpheme boundary, and

taken from a corpus of the major dialectal varieties of Greek, a corpus of

Crucially however, if the rst element of a cluster is a sibilant (impure s) it

(increase in laryngeal complexity) often parallels a decrease in segmental com-

typically-developing children, addressing the question of whether the phono-

gestural duration is constrained in assimilation processes. An optimality-

1. What is a consonant cluster?

(1) Consonantal Strength scale: No division between vowels and consonants

(7) A contact cluster is an intersyllabic cluster of cardinality two.

2. What is consonant cluster complexity?

Phonologists have gathered a lot of information on consonant clusters and

3. Consonant cluster complexity and the Head Law

(15) Old and Middle English

4. How is consonant cluster complexity reduced?

There are numerous mechanisms which reduce the complexity of consonant

(b) clamare chamar to call

5. How well do we understand consonant cluster complexity reduction?

Clearly in a manner of speaking we understand why and how the complexity

(29.b) fn- > sn- in English (Lutz 1997)

6. On understanding phonological changes

It was said in section 5 that phonologists do not understand certain things

of the worlds languages. They connect the phonologists phonotaxes to

7. Slope metathesis as consonant cluster complexity reduction

The answer is not trivial. It follows from a careful analysis of consonant

parabla > *parabra, !palabra dissim. tendency: r - Cr > l - Cr

syncope: -V.CV.CV- > -VC.CV- or -V.CCV-

apocope: -VC.CV > -VCC

8. Slope displacement as consonant cluster complexity reduction

As the examples in section 7 show, structural complexity with regard to con-

(39) Slope displacement (Vennemann 1988, 1997; examples from Rohlfs

(42) ca.'pes.tro > cra.'pes.tu rope (Calabria) cf. (39.b)

(45) pa.'dro.ne > pra.'do.ne lord, employer (Tuscan dialects)

(46) 'den.tro > 'dren.to within (written language)

phonology, relating to it much as semantics does to syntax, it is to be hoped

Head Law (6)

Coda Law (25)

Nucleus Law (42)

Contact Law (67)

General Syllabication Law (109)

The Stressed Syllable Law (100)

The Early Syllable Law (318)

The First Syllable Law (318)

Types of Syllable Contact Change (87)

Charles Jones (ed.), Historical linguistics: Problems and perspec-

In this paper I explore the relationship between two phonological features:

2. Clustering of sonorants (S) and obstruents (O)

2.1. The survey methodology

earlier cross-linguistic typological studies. My survey includes only those lan-

It should be noted that all sequences which conform to (3), including

(4) Sonority Sequencing Principle (SSP)

2.2. Results of survey

Table 1. Attested language types: feature [sonorant].

obstruent segment ([sonorant]) to a sonorant ([+sonorant]). Similarly, a de-