You are on page 1of 13

How Random is the

Gender of German Nouns?


Prof Roger Bilisoly
Department of Mathematical Sciences
Central Connecticut State University
November 19, 2011

Grammatical gender in German
Unlike modern English (but like Old English), all nouns in
German are either masculine, feminine, or neuter:
Die Frau (woman) is feminine
Das Weib (woman) is neuter
Das Auto (car) is neuter.
Der Wagen (car) is masculine.
Der Felsen (rock) is masculine.
Die Schuld (guilt) is feminine.
Der Mittwoch (Wednesday) is masculine
Some rules of German gender
Modern German grammar books have many rules.
Meaning based rules (semantics)
Seasons, months, and days of the week are masculine.
Example: Der Mittwoch (Wednesday) vs. die Woche (week)
Exception: das Fruhjahr (spring) inherits neuter from das Jahr
Female persons and animals are feminine.
Example: die Frau (women), die Kuh (cow)
Exceptions: das Weib (woman)
Form based rules (morphology and phonology)
Nouns ending with the syllable ung are feminine
Examples: die Einladung (invitation), die Prfung (exam)
Non-exception: Der Sprung (jump) is just one syllable (note: its
masculine gender follows from the strong verb, no ending rule).
All rules are from Section 1.1 of Durrell (2002)
Rules or no rules?
In German, however, the gender of only a very small
percentage of nouns can be predicted on the basis of
semantic and phonological properties of the noun. As
Mark Twain notes, there is no sense or system in the
distribution, that is, the German gender system is
largely arbitrary.
Pfau (2009), p 109

This paper builds on previous evidence provided by the
authors showing that gender classification is not
arbitrary in German.
From abstract of Zubin and Kpcke (1984)

Note: Highlighting is mine.
Statistical point of view:
How do we predict a nouns gender?
Both gender and our predictor variables are categorical,
so contingency tables of counts are a useful for data
exploration (EDA).
For modeling, logistic regression (for binary response
variables) can be extended to multinomial regression.
CV and VC clusters include many
well-known gender indicators.
Durrell (2002), Chapter 1, notes that er, en*, el are predominately masculine;
us is masculine; ung, sion, tion, tt, ik are feminine; and e is predominately feminine.
*Note that en is mostly neuter in the above table.
A Contingency table analysis of ie:
The masculine exceptions are Brie, Goalie,
Hippie, Junkie, Oldie, Teenie, Yuppie, and Laie.
The first 7 are loan words, 6 of which refer to
types of people and hence are male. Laie
(layman) also refers to a type of person.

The neuter exceptions are Genie, Knie,
Portemonnaie. Genie (genius) is anomalous in
not being male, and Portemonnaie (purse) is a
loan word.
Kpcke and Zubin analyzed one syllable nouns.
They and MacWhinney et al. (1989) claim that the proportion of
consonants is correlated with masculine gender. So define the
variable: excess = # consonants - # vowels.
Well fit a multinomial logistic regression to predict gender as a
function of x = excess. Well use neuter as the reference gender.
Model is below, and the next slide gives an interpretation of it.
See http://support.sas.com/rnd/app/da/new/802ce/stat/chap8/sect5.htm
x
x Neut Gender P
x Fem Gender P
x
x Neut Gender P
x Masc Gender P
fem fem
masc masc
, 1 , 0
, 1 , 0
) | . (
) | . (
log
) | . (
) | . (
log
| |
| |
+ =
|
|
.
|

\
|
=
=
+ =
|
|
.
|

\
|
=
=
Conclusion: excess is related to gender.
Both masculine and feminine
genders are related to the
variable excess. For example, if
excess increases by 1, the odds
ratio of masculine vs. neuter
increases by 58.5 %.
585 . 1
) 1 | . (
) 1 | . (
) 2 | . (
) 2 | . (
, 1
, 1 , 0
, 1 , 0
1 *
2 *
= = =
|
|
.
|

\
|
= =
= =
|
|
.
|

\
|
= =
= =
+
+
masc
masc masc
masc masc
e
e
e
x Neut Gender P
x Masc Gender P
x Neut Gender P
x Masc Gender P
|
| |
| |
References
Beiler, Benedictus (1736). A New German Grammar, whereby an Englishman
may Eaily Attain to the Knowledge of the German Language.
Berkemeyer, Victoria (1994). Anaphoric Resolution and Text Comprehension for
Readers of German, Die Unterrichtpraxis/Teaching German, 27, 15-22.
Bloomfield, (1914). An Introduction to the Study of Language.
Durrell, Martin (2002). Hammers German Grammar and Usage, 4
th
Edition.
MacWhinney, Brian, Jared Leinbach, Roman Taraban, and Janet McDonald
(1989). Language Learning: Cues or Rules? Journal of Memory and Language,
28, 255-277.
Pfau, Roland (2009). Grammar as Processor: A Distributed Morphology Account
of Spontaneous Speech Errors.
Rice, Curt (2006). Optimizing Gender. Lingua, 116, 1394-1417.
Schwichtenberg, Beate, and Niels Schiller (2004). Semantic Gender Assignment
Regularities in German, Brain and Language, 90, 326-337.
Wendeborn, Gebhard. (1797). An Introduction to German Grammar, 3
rd
Edition.
Zubin, David, and Klaus-Michael Kpcke (1984). Affect Classification in the
German Gender System, Lingua, 63, 41-96.
Beginnings are not considered useful
with respect to gender except for Ge.
Note that there are
only 3 cell chi-
squares above 10,
and all of these are
for Ge
What linguists have done
They have considered semantics.
Example: Seasons, months, days of the week, etc.
They have posited default rankings when conflicting rules do
not determine a unique gender.
Most marked (so least likely to be assigned) is neuter.
Least marked (so most likely to be assigned) is masculine.
See p 1405 of Rice (2006).
They have introduced new semantic variables.
Example: Section 2 of Zubin and Kpcke (1984) propose a variable
measuring extroversion/introversion and apply it to the irregular Mut
compounds:
Die Anmut (gracefulness), die Demut (humility), and die Sanftmut
(tenderness) vs. der Heldenmut (heroism), der Lebensmut
(exhilaration), and der Unmut (bad temper).
The final letters of a noun are correlated to gender.

Durrell (2002), Chapter 1, says that e is mainly feminine, ie and a* are feminine.
*Note that there are an above expected number of neuter words ending in a:
Dogma, Drama, Komma, Omega, Plasma, Schema, Zebra, etc.

You might also like