You are on page 1of 38

Representation of IPA with ASCII

The IPA is a relatively complex system of symbols, intended to facilitate the representation of every sound speakable by humans. It uses many symbols not in the latin alphabet, and as such is not directly renderable in 7-bit ASCII. For that matter, it's not even directly representable in ISO-Latin-1. So until Unicode becomes more standard, users of IPA on the internet have two options: make graphics or fake it up with ASCII. As for the former, check out my webfont, used to do these pages, but in email and in newsgroups and other textonly media, even graphics are out. Unicode is a lot better supported now than when I first did these pages in 1997, but still not perfect. Recentmodel web browsers can handle it (assuming the correct fonts are installed), though there are still a lot of people using older browsers, and of course mailing lists and newsgroups still have difficulty with it. In any case, to insert Unicode into HTML, you type � into your document, replacing "000" with the hex codes given below. For instance, if I type "ɐ", I get ""---that's a turned a, in case your browser doesn't display it right. Your own authoring software may give you a less painful way to do this. The leftmost column below has the IPA in this form, and is pasteable text on systems that support that. (I've left the graphical version in for those whose browsers don't do Unicode yet.) For those who want to use a unicodeless forum, or appeal to a broader audience, then, we have ASCII-IPA. The problem is, no representation is perfect; different systems are better for different purposes. Here, though, are several commonly used ASCII-IPA systems: Kirshenbaum, Coutts-Barrett, Branner, Carrasquer, and SAMPA. Kirshenbaum is popular among hobbyists because it tries to stay close to the physical representation of ASCII, or else have a decent mnemonic, for most things. SAMPA seems more popular among professional linguists for reasons which elude me. And of course, in addition to these schemes you will see various home-grown schemes which may or may not be marked as such. But notice that the most commonly used IPA symbols tend to be pretty uniform among all the schemes. Usually if an author uses one of the more esoteric bits of IPA, e will specify what scheme e's using to transcribe it to ASCII. And without further ado... IPA a b c c Unicode
U+0061 U+0250 U+0251 U+0252 U+00E6 U+028C U+0062 U+0253 U+0299 U+03B2 U+0063 U+0063 U+02BC

Kirsh. a

A A. & V b b` b<trl> B c

CB Bran. Carr. SAMPA Description Cardinal vowel 4: open front unrounded a a a a ("lower-case a") Almost fully open central unrounded vowel ;a a& a" 6 ("turned a") Cardinal vowel 5: open back unrounded @ A A A ("script a") Cardinal vowel 13: open back rounded ;@ A& A" Q ("turned script a") Almost fully open front unrounded vowel .ae ae) & { ("ash") Cardinal vowel 14: open mid back ;v v& A* V unrounded ("inverted v") b b b b" B" B c c` B\ B c b Voiced bilabial stop ("lower-case b") Voiced bilabial implosive ("hooktop b") Bilabial trill ("small capital b") Voiced bilabial fricative ("beta") Voiceless palatal stop ("lower-case c") Palatal ejective

(b b$ B B

|B V c c' c

d d e

U+00E7 U+0255 U+0064 U+0064 U+032a U+0257 U+0256 U+00F0 U+0065 U+0259 U+025a U+0258 U+025b

>c c" &c ci)

C c" d

C s\ d

Voiceless palatal fricative ("c cedilla") Voiceless alveolo-palatal fricative ("curly-tail c") Voiced alveolar stop ("lower-case d") Voiced dental stop

d d[ d` d. D e @ R @ E

(d d$ <d dr) -d e ;e D e @

d" d. D e @ d` D e @

Voiced dental/alveolar implosive ("hooktop d") Voiced retroflex stop ("right-tail d") Voiced dental fricative ("eth") Cardinal vowel 2: close-mid front unrounded ("lower-case e") Mid central vowel ("schwa") Rhotic mid central vowel ("right-hook schwa") Close-mid central unrounded vowel ("reversed e") Cardinal vowel 3: open-mid front unrounded ("epsilon") alt Open-mid central unrounded vowel ("reversed epsilon") alt Rhotic open-mid central unrounded vowel ("right-hook reversed epsilon") Open-mid central rounded vowel ("closed epsilon") Voiceless labiodental fricative ("lower-case f") Voiced velar stop ("lower-case g") Voiced labiovelar stop ("g-b tie")

9 ;3 E

e& E

e" E

@\ E

U+025c

V"

3 ;E

E&

E"

U+025d U+025e U+0066 U+0261

R<umd> O" f g d<lbv> W E" f g f g g))b (g g$ G G g" G" G* G G _G o* h 7 h E* f g 3\ f g

U+0260 U+0262 U+029b U+0263 U+02e0 U+0264 U+0068

g` G G` Q ~ oh

Voiced velar implosive ("hooktop g") Voiced uvular stop ("small capital g") Voiced uvular implosive ("hooktop small capital g") Voiced velar fricative ("gamma") Velarized diacritic ("superscript gamma") Cardinal vowel 15: close-mid back unrounded ("baby gamma") Voiceless glottal fricative ("lower-case h")

(G G$ &V g" g^ &v U" h h

i j k k

U+02b0 U+0127 U+0266 U+0267 U+0265 U+029c U+0069 U+0268 U+026a U+006a U+02b2 U+029d U+025f U+0284 U+006b U+006B U+02BC

<h> H h<?>

^h h^ -h hH h" S* w" H" i i" I j

_h X\ h\ x\ H H\ i 1 I j ', _j j" J J" k k` k j\

Aspirated diacritic ("superscript h") Voiceless pharyngeal fricative ("crossed h") Voiced glottal fricative ("hooktop h") Simultaneous S and x ("hooktop heng") Voiced labial-palatal approximant ("turned h") Voiceless epiglottal fricative ("small capital h") Cardinal vowel 1: close front unrounded ("lower-case i") Close central unrounded vowel ("barred i") Almost fully close front unrounded vowel ("small capital i") Palatal approximant ("lower-case j") Palatalized diacritic ("superscript j") Voiced palatal fricative ("curly-tail j") Voiced palatal stop ("barred dotless j") Voiced palatal implosive ("barred esh") Voiceless velar stop ("lower-case k") Velar ejective Voiceless labiovelar stop ("k-p tie") ("turned k")

(h h" (h) Sx)

j<rnd>

;h H

h& H i iI j j^

i i" I j ;

i -i I j ^j

C<vcd> &j j" J J` k k` t<lbv> k! k k' -j jj$ k k` k))p

l l

U+006c U+006C U+032A

l l[ <lat>

Alveolar lateral approximant ("lower-case l") Dental lateral

^l ~l

l^

_l 5 s* l. z* L" m K l` K\ L\ m

Lateral release diacritic ("superscript l") ("l with tilde") Voiceless alveolar lateral fricative ("belted l") Retroflex lateral approximant ("l with right tail") Voiced alveolar lateral fricative ("l-ezh ligature") Velar lateral approximant ("small capital l") Bilabial nasal ("lower-case m")

U+026b U+026c U+026d U+026e U+029f U+006d

s<lat> l. z<lat> L m

&l l<l lr) .lZ Z" L m L m

U+0271 U+026f

M u-

m) M ;m W m&

M U"

n n

U+0270 U+006e U+006E U+032A

j<vel> n n[

;|m W" n n

W" M\ n n

Labiodental nasal ("m with leftward tail at right") Cardinal vowel 16: close back unrounded ("turned m") Alternate Velar approximant ("turned m with long right leg") Alveolar nasal ("lower-case n") Dental nasal

^n n^
U+0272 U+014b

_n n" N J N

Nasal release diacritic ("superscript n") Palatal nasal ("n with leftward hook at left") Velar nasal ("eng") Labiovelar nasal ("eng-m tie")

n^ N n<lbv>

|n

nj)

n) ng) ng)))m <n nr) N o N o

U+0273 U+0274 U+006f U+0298 U+0275 U+00f8 U+0153 U+0276 U+0254

n. n" o p! @. Y &. W O

n. N" o p! e* o" O" &* O

n` N\ o O\ 8 2 9 & O

Retroflex nasal ("n with right tail") Uvular nasal ("small capital n") Cardinal vowel 7: close-mid back rounded ("lower-case o") Bilabial click ("bull's eye") Close-mid central rounded vowel ("barred o") Cardinal vowel 10: close-mid front rounded ("slashed o") Cardinal vowel 11: open-mid front founded (o-e ligature) Cardinal vowel 12: open front rounded ("small capital o-e ligature") Cardinal vowel 6: open-mid back rounded ("turned c") Alternate Voiceless bilabial stop ("lower-case p") Bilabial ejective Voiceless bilabial fricative ("phi")

!* p! -o o-

% o" .oe oe) .OE OE) ;c O c&

p p q q r

U+0070 U+0070 U+02BC U+0278 U+0071 U+0071 U+02BC U+0072 U+027e U+027c

p p` F q q` r<trl> *

p p' |o q q' r ;J

p p` F q

p p` P q q`

Voiceless uvular stop ("lower-case q") Uvular ejective

r d" Rr)

r r" r*

Alveolar trill ("lower-case r") Alveolar flap ("fish-hook r") Retroflex trill ("r with long leg")

r s s

U+027d U+0279 U+0072 U+032A U+027b U+027a U+0280 U+0281 U+0073 U+0073 U+02BC U+0282

*. r r[ r. *<lat> r" g" s s` s.

<r r" ;r r&

r. R r\

Retroflex flap ("r with right tail") Alveolar approximant ("turned r") Dental approximant

<;r jr) ;|r R l" R

R. l" R" R* s

r\` l\

Retroflex approximant ("turned r with right tail") Alveolar lateral flap ("turned long-legged r") Uvular trill ("small capital r")

;R R& s s s` <s sr) s!

R s

Uvular fricative ("inverted small capital r") Voiceless alveolar fricative ("lower-case s") Alveolar fricative ejective

s.

s`

Voiceless retroflex fricative ("s with right tail") Voiceless postalveolar fricative ("esh") Voiceless alveolar stop ("lower-case t") Voiceless dental stop Voiceless dental ejective

t t t t u v w x

U+0283 U+0074 U+0074 U+032A U+0074 U+032A U+02BC U+0074 U+02BC U+0288 U+03b8 U+0075 U+0289 U+028a U+0076 U+028b U+0077 U+02b7 U+028d U+0078 U+03c7

S t t[ t[` t` t. T u u" U v r<lbd> w <w>

S t

S t

S t

S t

t'

t`

t` t. T u u" U v V w t` T u } U v P, v\ w _w W x X W x X

Dental/alveolar ejective Voiceless retroflex stop ("t with right tail") Voiceless dental fricative ("theta") Cardinal vowel 8: close back rounded ("lower-case u") Close central rounded vowel ("barred u") Almost fully close back rounded vowel ("upsilon") Voiced labiodental fricative ("lower-case v") Labiodental approximant ("script v") Voiced labial-velar approximant ("lower-case w") Labialized diacritic ("superscript w") Voiceless labial-velar fricative ("inverted w") Voiceless velar fricative ("lower-case x") Voiceless uvular fricative ("chi")

<t tr) -O T u -u U v V w u uU v v" w

^w w^

w<vls> ;w w& x X x X x X

y z

U+0079 U+028e U+028f U+007a U+0291 U+0290 U+0292 U+0294

y l^ I. z

y ;y Y z

y y& Y z

y L Y z z" z. Z ?

y L Y z z\ z` Z ?

Cardinal vowel 9: close front rounded ("lower-case y") Palatal lateral approximant ("turned y") Almost fully close front rounded vowel ("small capital y") Voiced alveolar fricative ("lower-case z") Voiced alveolo-palatal fricative ("curly-tail z") Voiced retroflex fricative ("z with right tail") Voiced postalveolar fricative ("ezh") Glottal stop Glottal stop (optional substitute for ?)

&z zi) z. Z ? <z zr) Z ? Z ? Q

U+02a1 U+0295 U+02a2

-? H<vcd> ;?

??&

?" # #"

>\ ?\ <\

Epiglottal plosive ("barred glottal stop") Voiced pharyngeal fricative ("reversed glottal stop") Voiced epiglottal fricative ("barred reversed glottal stop") Alternate Pharyngealized diacritic ("superscript reversed glottal stop") (Post)alveolar click ("exclamation point") Dental click ("pipe") Palatoalveolar click ("double-barred pipe") Alveolar lateral click ("double pipe") Minor (foot) group ("pipe") Major (intonation) group ("double pipe")

-;? ?" ?&-

U+02e4 U+01c3 U+01c0 U+01c2 U+01c1

<H> c! t! c! l!

^;? ?&^ !P r! !D t! !# c! !! | || l! | || _ [ ] + = =" c! t! c! l!

_?\ !\ |\ =\ |\|\

U+0320 U+032a U+033a U+031f U+031d U+031e

_ [ ] +

__d _a _+ _r _o " %

Retracted diacritic ("under-bar") Dental diacritic ("subscript bridge") Apical diacritic ("subscript inverted bridge") Advanced diacritic ("subscript plus") Raised diacritic ("raising sign") Lowered diacritic ("lowering sign") Primary stress ("superior vertical stroke") Secondary stress ("inferior vertical stroke")

U+02c8 U+02cc

' ,

' ,

' ,

U+0329 U+031a

<o>

,)

=, _= _}

Syllabic diacritic ("syllabicity mark") No audible release diacritic ("corner") Syllable break ("period")

^7 .) . .

U+002e U+02d1 U+0308 U+0324

." ; " <?> : ` <o> ;" : "^ h") : ` ^o V)

:\ _" _t :

Half-long ("half-length mark") Centralized diacritic ("umlaut") Breathy voiced diacritic ("subscript umlaut") Long ("length mark") Ejective ("apostrophe")

U+02d0 U+02bc U+0325 U+030a U+031c U+0339 U+0303 U+0334 U+0330 U+032c U+0306 U+032f

_0

Voiceless diacritic ("under-ring") Voiceless diacritic (use if character has descender) Less rounded diacritic ("subscript left halfring") More rounded diacritic ("subscript right halfring") Nasalized diacritic ("superscript tilde") Velarized or pharyngealized diacritic ("superimposed tilde") Creaky voiced diacritic ("subscript tilde") Voiced diacritic ("subscript wedge") Extra-short ("breve") Non-syllabic diacritic ("subscript arch") Tie bar ("top ligature") Linking (absence of a break) ("bottom ligature") Mid-centralized diacritic ("superscript x")

< > ~ ~ ~ = ;~

U) u) ~^ ~) ~

_c _O ~, _~ _e _k _v _X _^ _ -\

^v v) .' ;$ . (^ ( )) =)

U+033d

^x x^ <r> ^r [] r^ [] < > } ^ { / ` _m _A _q _N

U+02de U+033b U+0318 U+0319 U+033c

Rhoticity diacritic ("rhoticity mark") Laminal diacritic ("subscript box") Advanced tongue-root diacritic ("advancing sign") Retracted tongue-root diacritic ("retracting sign") Linguolabial diacritic ("subscript seagull") Global rise ("diagonal up arrow")

U+2197

U+2191 U+2198 U+2193 U+030f

/) ;^ \ \) 1 11 13 15

Upstep ("up arrow") Global fall ("diagonal down arrow")

! _B

Downstep ("down arrow") (tones) (tones)

_R _L _B_L _M _H_T _H _F

(tones) (tones) (tones) (tones) (tones) (tones) (tones) (tones)

U+0300

22 31

U+0304

33 35

U+0301

44 51 53

U+030b

55

_T

(tones)

Usenet IPA/ASCII transcription


In August of 1992, some of the readers of the Usenet newsgroups sci.lang and alt.usage.english got fed up with common in which posters tried to describe how words were pronounced (by them or in dialects under discussion) by reference to how other words were pronounced (by the author). Since individuals pronounce different words differently, this tended to lead to (occasionally interesting, but often merely) long, fruitless threads. There already was a scheme occasionally used for noting transcription, but it suffered from (among other things) the fact that it was highly skewed toward describing English. This made it less than useful for the denizens of sci.lang. Since there already existed a notation (the International Phonetic Alphabet, or IPA) for precisely specifying phonemic and phonetic values, several of us decided that it couldn't be too hard to put together a reasonable transcription scheme of IPA into 7-bit ASCII characters. We naturally had to allow some of the IPA symbols to map onto multiple characters (since there are more IPA symbols than ASCII characters), but we finally settled on a scheme in which each segment is represented by a single character, potentially followed by some number of "diacritics", which can either be single characters or delimited tokens. [We also came up with a very narrow feature-based representation for use when precision is needed or when no symbol completely fits the bill.] Unlike some other such attempts, we took it as a given that this transcription had to be directly readable, so each character needed to be at least somewhat evocative of its IPA value.

It is expected that when the Unicode/ISO 10646 character set becomes commonly used for mail, news, and web pages, this transcription will no longer be needed, as the IPA characters will be able to be used directly. Included in this archive are the specification itself and the "Pronunciation Symbols" page of MerriamWebster's New Collegiate Dictionary", done over in this transcription. This latter should be of use for American English speakers who are not used to the IPA symbols. In the future I hope to add a version of the specification which includes images of the actual IPA characters as well as sound clips of each of the segments.

FAQ: Representing IPA Phonetics in ASCII


Evan Kirshenbaum <kirshenbaum@hpl.hp.com>
[Last Modified, 4 Jan 1993] [Error corrected 22 Jan 2001]

This article describes a standard scheme for representing IPA transcriptions in ASCII for use in Usenet articles and email. The following guidelines were kept in mind:

It should be usable for both phonemic and narrow phonetic transcription. It should be possible to represent all symbols and diacritics in the IPA. The previous guideline notwithstanding, it is expected that (as in the past) most use will be in transcribing English, so where tradeoffs are necessary, decisions should be made in favor of ease of representation of phonemes which are common in English. The representation should be readable. It should be possible to mechanically translate from the representation to a character set which includes IPA. The reverse would also be nice.

In order to be able to represent a wide range of segments while making common segments easy to type, we allow more than one representation for a given segment. Each segment has an "explicit" representation, which is a set of features between curly braces ("{" and "}"). Each feature is represented as a three letter abbreviation taken from a standardized set. The phoneme /b/ (a voiced, bilabial stop) could be represented as /{vcd,blb,stp}/. A first cut at the feature set appears in appendix A below. The word tag could thus be represented phonemically as
/{vls,alv,stp}{low,fnt,unr,vwl}{vcd,vel,stp}/

and phonetically as
[{vls,asp,alv,stp}{low,fnt,lng,unr,vwl}{unx,vcd,vel,stp}]

This works, but it's a bit of a pain. To simplify transcription, we allow an "implicit" representation for a segment which consists of a (generally alphabetic) symbol followed by diacritics. Thus /b/ stands for /{vcd,blb,stp}/. Case is significant (/n/ and /N/ are different segments). The segment symbols are given in appendix B below. The word tag can thus be represented phonemically as
/t&g/

The diacritics for a segment are represented between angle brackets ("<" and ">") and consist of symbols or features. (In the common case where the diacritic symbol is a single character which does not encode a segment, the brackets may be removed.) The features which the diacritics map to override those of the segment.

The word tag thus becomes narrowly


[t<asp>&<lng>g<unx>]

or
[t<h>&<:>g<o>]

or
[t<h>&:g<o>]

Some diacritic symbols encode more than one feature set. Which one is meant should be apparent from context. For example, "." stands for "{rnd}" when attached to a vowel, but "{rfx}" when attached to a consonant. Clicks are common to many languages (especially in Africa), but there is no IPA diacritic that means "click". Rather than use up several characters for clicks (which are infrequent in the languages most often discussed), we instead use the diacritic "!" after the homorganic unvoiced stop. Thus /t!/ (= /t<clk>/ = /{alv,clk}/) is the sound commonly written tsk and used in English to show disapproval. The complete set of diacritic symbols appears in appendix C below. Appendices D and E contain representations of segments more or less ordered by feature (appendix D in tabular form, appendix E as a list). Appendix F contains a list of all of the ASCII characters and the uses they have been pressed to. For transcription of any specific language a group can by convention alter the character mappings (as an example, for Spanish /R/ may be better used to represent /{alv,trl}/ than /{mid,cnt,rzd,vwl}/). An author may also press a little used symbol (for the language under consideration) into service to highlight a distinction. Such an alteration should be made explicitly to avoid confusion. The diacritics "+" and "=" and the segment symbols "$" and "%" are explicitly left unspecified so that they can be used to mark language-specific features (that are otherwise cumbersome to mark). Such symbols can be assigned either by convention for a specific language or in an ad-hoc manner by an individual author. Stress marks are prepended to the syllable they attach to. "'" signals primary stress, "," signals secondary stress. Spaces should be employed to separate words (cliticized words may be written unseparated). When discussing single words, it may be helpful to insert a space before each syllable that doesn't carry a suprasegmental marker. Thus, I hear the secretary for an American might be something like
/aI hir D@ 'sEkrI,t&ri/

while to an Englishman it might be more like


/aI hi@ DI 'sEkrVtri/

Transcribing tone is harder. Here's an attempt. For register tone languages (e.g., Hausa, Navajo), numbers should be used with one being the lowest. Thus in Navajo, "1" is low tone and "2" is high. In Yoruba "1" is low, "2" is mid, and "3" is high. The language's "default" tone need not be specified. For contour tone languages (e.g., Mandarin, Thai), there is generally a numeric system in place (Mandarin: "1" is high, "2" is rising, "3" is falling rising, "4" is falling). The tone indication should follow the syllable (vowel?). The symbol "#" is used to represent a syllable or word boundary.

Appendix A. Feature Abbreviations


vcd vls voiced voiceless nas orl nasal oral fnt cnt front center

blb lbd dnt alv rfx pla pal vel lbv uvl phr glt stp frc

bilabial labio-dental dental alveolar retroflex palato-alveolar palatal velar labio-velar uvular pharyngeal Glottal stop fricative

apr vwl lat ctl trl flp clk ejc imp hgh smh umd mid lmd low

approximant vowel lateral central trill flap click ejective implosive high semi-high upper-mid mid lower-mid low

bck unr rnd asp unx syl mrm lng vzd lzd pzd rzd nzd fzd

back unrounded rounded aspirated unexploded syllabic murmured long velarized labialized palatalized rhoticized nasalized pharyngealized

Appendix B. Segment Symbols


This table lists the symbol, the associated feature set, and the Unicode character code and name for the corresponding IPA character. In some cases (e.g., /I/) there are multiple IPA characters in use for the segment. I have listed both. In some cases (e.g. /j/), the IPA symbol seems to be ambiguous (generally between an approximant and the homorganic voiced fricative). The entries marked with "??" are those that I am least sure of. When I have listed more than one possibility for a symbol, the first is my current preference.
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H {low,cnt,unr,vwl} {vcd,blb,stp} {vls,pal,stp} {vcd,alv,stp} {umd,fnt,urd,vwl} {vls,lbd,frc} {vcd,vel,stp} U+0061 LATIN SMALL LETTER A U+0062 LATIN SMALL LETTER B U+0063 LATIN SMALL LETTER C U+0064 LATIN SMALL LETTER D U+0065 LATIN SMALL LETTER E U+0066 LATIN SMALL LETTER F U+0067 LATIN SMALL LETTER G U+0261 LATIN SMALL LETTER SCRIPT G {glt,apr} U+0068 LATIN SMALL LETTER H {hgh,fnt,unr,vwl} U+0069 LATIN SMALL LETTER I {pal,apr}/{vcd,pal,frc} U+006A LATIN SMALL LETTER J {vls,vel,stp} U+006B LATIN SMALL LETTER K {vcd,alv,lat} U+006C LATIN SMALL LETTER L {blb,nas} U+006D LATIN SMALL LETTER M {alv,nas} U+006E LATIN SMALL LETTER N {umd,bck,rnd,vwl} U+006F LATIN SMALL LETTER O {vls,blb,stp} U+0070 LATIN SMALL LETTER P {vls,uvl,stp} U+0071 LATIN SMALL LETTER Q {alv,apr} U+0279 LATIN SMALL LETTER TURNED R {vls,alv,frc} U+0073 LATIN SMALL LETTER S {vls,alv,stp} U+0074 LATIN SMALL LETTER T {hgh,bck,rnd,vwl} U+0075 LATIN SMALL LETTER U {vcd,lbd,frc} U+0076 LATIN SMALL LETTER V {lbv,apr}/{vcd,lbv,frc} U+0077 LATIN SMALL LETTER W {vls,vel,frc} U+0078 LATIN SMALL LETTER X {hgh,fnt,rnd,vwl} U+0079 LATIN SMALL LETTER Y {vcd,alv,frc} U+007A LATIN SMALL LETTER Z {low,bck,unr,vwl} {vcd,blb,frc} {vls,pal,frc} {vcd,dnt,frc} {lmd,fnt,unr,vwl} -- Unused -{vcd,uvl,stp} {vls,phr,frc} U+0251 U+03B2 U+00E7 U+00F0 U+025B LATIN GREEK LATIN LATIN LATIN SMALL SMALL SMALL SMALL SMALL LETTER LETTER LETTER LETTER LETTER SCRIPT A BETA C CEDILLA ETH EPSILON

U+0262 LATIN LETTER SMALL CAPITAL G U+0127 LATIN SMALL LETTER H BAR

I {smh,fnt,unr,vwl} J {vcd,pal,stp} K -- Unused -L {vcd,vel,lat} M N O P Q R

U+026A LATIN LETTER SMALL CAPITAL I U+0269 LATIN SMALL LETTER IOTA U+025F LATIN SMALL LETTER DOTLESS J BAR ?? U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE U+029F LATIN LETTER SMALL CAPITAL L U+026C LATIN SMALL LETTER L BELT U+0271 LATIN SMALL LETTER M HOOK U+014B LATIN SMALL LETTER ENG U+0254 LATIN SMALL LETTER OPEN O U+03A6 GREEK CAPITAL LETTER PHI U+0263 LATIN SMALL LETTER GAMMA U+025A LATIN SMALL LETTER SCHWA HOOK U+0280 LATIN LETTER SMALL CAPITAL R U+0283 LATIN SMALL LETTER ESH U+03B8 GREEK SMALL LETTER THETA U+028A LATIN SMALL LETTER UPSILON U+0277 LATIN SMALL LETTER CLOSED OMEGA U+028C LATIN SMALL LETTER TURNED V U+0153 LATIN SMALL LETTER O E U+03C7 GREEK SMALL LETTER CHI U+00F8 LATIN SMALL LETTER O SLASH U+0153 LATIN SMALL LETTER O E U+0292 LATIN SMALL LETTER YOGH LATIN LATIN LATIN LATIN SMALL SMALL SMALL SMALL LETTER LETTER LETTER LETTER GLOTTAL STOP SCHWA A E FISHHOOK R

{vls,alv,lat,frc} ?? {lbd,nas} {vel,nas} {lmd,bck,rnd,vwl} {vls,blb,frc} {vcd,vel,frc} {mid,cnt,rzd,vwl} ?? {alv,trl} ?? S {vls,pla,frc} T {vls,dnt,frc} U {smh,bck,rnd,vwl} V W X Y {lmd,bck,unr,vwl} {lmd,fnt,rnd,vwl} ?? {vls,uvl,frc} {umd,fnt,rnd,vwl} ?? {lmd,fnt,rnd,vwl} ?? Z {vcd,pla,frc} ? @ & * % $

{glt,stp} U+0294 {mid,cnt,unr,vwl} U+0259 {low,fnt,unr,vwl} U+00E6 {vcd,alv,flp} U+027E -- Ad Hoc Segment --- Ad Hoc Segment --

Appendix C. Diacritics
~ Vowels: {nzd} Consonants: {vzd} : {lng} - Vowels: {unr} Consonants: {syl} ! {clk} . Vowels: {rnd} Consonants: {rfx} U+0303 NON-SPACING TILDE U+0334 NON-SPACING TILDE OVERLAY U+02D0 MODIFIER LETTER TRIANGULAR COLON -- No equivalent -U+0329 NON-SPACING VERTICAL LINE BELOW -- No equivalent --- No equivalent -U+0322 NON-SPACING RETROFLEX HOOK BELOW U+0323 NON-SPACING DOT BELOW ` Voiceless: {ejc} U+02BC MODIFIER LETTER APOSTROPHE Voiced: {imp} -- No equivalent -[ {dnt} U+032A NON-SPACING BRIDGE BELOW ; {pzd} U+02B2 MODIFIER LETTER SMALL J U+0321 NON-SPACING PALATALIZED HOOK BELOW " Vowels: {cnt} -- No equivalent -Consonants: {uvl} -- No equivalent -^ {pal} -- No equivalent -+ -- Ad Hoc Diacritic -= -- Ad Hoc Diacritic -<H> {fzd} U+0334 NON-SPACING TILDE OVERLAY <h> {asp} U+02B0 MODIFIER LETTER SMALL H <o> {unx} ?? U+02DA SPACING RING ABOVE {vls} ?? U+0325 NON-SPACING RING BELOW <r> {rzd} U+02B3 MODIFIER LETTER SMALL R <w> {lzd} U+02B7 MODIFIER LETTER SMALL W U+032B NON-SPACING INVERTED DOUBLE ARCH BELOW <?> {mrm} U+02B1 MODIFIER LETTER SMALL H HOOK U+0324 NON-SPACING DOUBLE DOT BELOW

Appendix D. Segment Table


blb-- -lbd-- --dnt-- --alv-- -rfx- -pla-- --pal--- --vel-- -----uvl----nas stp frc apr lat trl flp ejc imp clk m p b P B b<trl> p` b` p! ---- lbv ---nas stp frc apr n<lbv> t<lbv> d<lbv> w<vls> w w ----- unr ----fnt cnt bck hgh smh umd mid lmd low i I e E & i" @<umd> @ V" a uoV A R<umd> R t! t[` d` M f v r<lbd> n[ t[ d[ T D r[ l[ n n. t d t. d. s z s. z. r r. l l. r<trl> * *. t` d` c! ---glt--? h<?> h ----- rnd ----fnt cnt bck y I. Y W &. u" @. O" a. u U o O A. alv lat frc: s<lat> z<lat> lat flp: *<lat> lat clk: l! n^ c J C C<vcd> j l^ c` J` c! k! N k g q x Q X j<vel> L k' g` q` G` n" G g" g" r"

S Z

--phr--

H H<vcd>

unr cnt rzd

Appendix E. Segment List


Where a segment requires more than one character to represent, and there is a single IPA character, the Unicode code and name is noted. If I couldn't find an IPA symbol for a segment, I left it out.
{blb,nas} {vls,blb,stp} {vcd,blb,stp} {vls,blb,frc} {vcd,blb,frc} {blb,trl} {blb,imp} {blb,ejc} {blb,clk} {lbd,nas} {vls,lbd,frc} {vcd,lbd,frc} {lbd,apr} {dnt,nas} {vls,dnt,stp} {vcd,dnt,stp} {vls,dnt,frc} {vcd,dnt,frc} {dnt,apr} {dnt,lat} {dnt,imp} {dnt,ejc} /m/ /p/ /b/ /P/ /B/ /b<trl>/ /b`/ /p`/ /p!/ /M/ /f/ /v/ /r<lbd>/

U+0299 LATIN LETTER SMALL CAPITAL B U+0253 LATIN SMALL LETTER B HOOK U+0298 LATIN LETTER BULLSEYE

U+028B LATIN SMALL LETTER SCRIPT V

/n[/ /t[/ /d[/ /T/ /D/ /r[/ /l[/ /d`/ (same as {alv,imp}) /t[`/

{dnt,clk}

/t!/ U+0287 LATIN SMALL LETTER TURNED T (by rights this should be alveolar, but the alveolar and palatal clicks use the same symbol (/c!/))

{alv,nas} {vls,alv,stp} {vcd,alv,stp} {vls,alv,frc} {vcd,alv,frc} {alv,apr} {alv,lat} {alv,trl}

/n/ /t/ /d/ /s/ /z/ /r/ /l/ /r<trl>/ U+0072 LATIN SMALL LETTER R (perhaps /R/) {alv,flp} /*/ {vls,alv,lat,frc} /s<lat>/ U+026C LAIN SMALL LETTER L BELT {vcd,alv,lat,frc} /z<lat>/ U+026E LATIN SMALL LETTER L YOGH {alv,imp} /d`/ U+0257 LATIN SMALL LETTER D HOOK {alv,ejc} /t`/ {alv,clk} /c!/ (same as {pal,clk}) {rfx,nas} {vls,rfx,stp} {vcd,rfx,stp} {vls,rfx,frc} {vcd,rfx,frc} {rfx,apr} {rfx,lat} {rfx,flp} {vls,pla,frc} {vcd,pla,frc} {pal,nas} {vls,pal,stp} {vcd,pal,stp} {vls,pal,frc} {vcd,pal,frc} {pal,apr} {rnd,pal,apr} {pal,lat} {pal,imp} {pal,clk} {vel,nas} {vls,vel,stp} {vcd,vel,stp} {vls,vel,frc} {vcd,vel,frc} {vel,apr} {vel,lat} {vel,imp} {vel,ejc} {vel,clk} {lbv,nas} {vls,lbv,stp} {vcd,lbv,stp} {vls,lbv,frc} {vcd,lbv,frc} {lbv,apr} {uvl,nas} {vls,uvl,stp} {vcd,uvl,stp} /n./ /t./ /d./ /s./ /z./ /r./ /l./ /*./ /S/ /Z/ /n^/ /c/ /J/ /C/ /C<vcd>/ U+029D LATIN SMALL LETTER CROSSED-TAIL J (perhaps /j/ (same as {pal,apr})) /j/ /j<rnd>/ /l^/ /J`/ /c!/ /N/ /k/ /g/ /x/ /Q/ /j<vel>/ /L/ /g`/ /k'/ /k!/ U+0265 U+028E U+0284 U+0297 LATIN LATIN LATIN LATIN SMALL LETTER TURNED H SMALL LETTER TURNED Y SMALL LETTER DOTLESS J BAR HOOK LETTER STRETCHED C U+0273 U+0288 U+0256 U+0282 U+0290 U+027B U+026D U+027D LATIN LATIN LATIN LATIN LATIN LATIN LATIN LATIN SMALL SMALL SMALL SMALL SMALL SMALL SMALL SMALL LETTER LETTER LETTER LETTER LETTER LETTER LETTER LETTER N RETROFLEX HOOK T RETROFLEX HOOK D RETROFLEX HOOK S HOOK Z RETROFLEX HOOK TURNED R HOOK L RETROFLEX HOOK R HOOK

U+0270 LATIN SMALL LETTER TURNED M WITH LONG LEG U+0260 LATIN SMALL LETTER G HOOK U+029E LATIN SMALL LETTER TURNED K

/n<lbv>/ Written as "ng" with tie above /t<lbv>/ Written as "kp" with tie above /d<lbv>/ Written as "gb" with tie above /w<vls>/ U+028D LATIN SMALL LETTER TURNED W /w/ (same as {lbv,apr}) /w/ /n"/ /q/ /G/ U+0274 LATIN LETTER SMALL CAPITAL N

{vls,uvl,frc} {vcd,uvl,frc} {uvl,apr} {uvl,trl} {vls,uvl,imp} {vcd,uvl,imp} {vls,phr,frc} {vcd,phr,frc} {glt,stp} {glt,apr} {mrm,glt,frc} {vcd,lat,flp} {lat,clk}

/X/ U+03C7 GREEK SMALL LETTER /g"/ (same as {uvl,apr}) /g"/ U+0281 LATIN LETTER SMALL /r"/ U+0280 LATIN LETTER SMALL /q`/ U+02A0 LATIN SMALL LETTER /G`/ U+029B LATIN LETTER SMALL /H/ /H<vcd>/ /?/ /h/ /h<?>/ /*<lat>/ /l!/ /i/ /y/ /I/ /I./ /e/ /Y/ /E/ /W/ /&/ /&./

CHI CAPITAL INVERTED R CAPITAL R Q HOOK CAPITAL G HOOK

U+0295 LATIN LETTER REVERSED GLOTTAL STOP

U+0266 LATIN SMALL LETTER H HOOK U+027A LATIN SMALL LETTER TURNED R WITH LONG LEG U+0296 LATIN LETTER INVERTED GLOTTAL STOP

{hgh,fnt,unr,vwl} {hgh,fnt,rnd,vwl} {smh,fnt,unr,vwl} {smh,fnt,rnd,vwl} {umd,fnt,unr,vwl} {umd,fnt,rnd,vwl} {lmd,fnt,unr,vwl} {lmd,fnt,rnd,vwl} {low,fnt,unr,vwl} {low,fnt,rnd,vwl}

U+028F LATIN LETTER SMALL CAPITAL Y

U+0276 LATIN LETTER SMALL CAPITAL O E

{hgh,cnt,unr,vwl} /i"/ U+0268 LATIN SMALL LETTER BARRED I {hgh,cnt,rnd,vwl} /u"/ U+0289 LATIN SMALL LETTER U BAR {umd,cnt,unr,vwl} /@<umd>/ U+0258 LATIN SMALL LETTER REVERSED E {umd,cnt,unr,rzd,vwl} /R<umd>/ U+025D LATIN SMALL LETTER REVERSED EPSILON HOOK {mid,cnt,unr,vwl} /@/ {mid,cnt,unr,rzd,vwl} /R/ {mid,cnt,rnd,vwl} /@./ U+0275 LATIN SMALL LETTER BARRED O {lmd,cnt,unr,vwl} /V"/ U+025C LATIN SMALL LETTER REVERSED EPSILON {lmd,cnt,rnd,vwl} /O"/ U+025E LATIN SMALL LETTER CLOSED REVERSED EPSILON {low,cnt,unr,vwl} /a/ U+0061 LATIN SMALL LETTER A {hgh,bck,unr,vwl} /u-/ {hgh,bck,rnd,vwl} /u/ {smh,bck,rnd,vwl} /U/ {umd,bck,unr,vwl} {umd,bck,rnd,vwl} {lmd,bck,unr,vwl} {lmd,bck,rnd,vwl} {low,bck,unr,vwl} {low,bck,rnd,vwl} /o-/ /o/ /V/ /O/ /A/ /A./ U+026F LATIN SMALL LETTER TURNED M U+028A LATIN SMALL LETTER UPSILON U+0277 LATIN SMALL LETTER CLOSED OMEGA U+0264 LATIN SMALL LETTER BABY GAMMA

U+0252 LATIN SMALL LETTER TURNED SCRIPT A

Appendix F. ASCII Table


In the following table, the following abbreviations are used:
P: punctuation S: segment D: diacritic <D>: diacritic that must be sp P: Separate words and segments ! D: {clk} " D Vowel: {cnt} Cons: {uvl} # Unused

delimited Q S: {vcd,vel,frc} R S: {mid,cnt,rzd,vwl} S S: {vls,pla,frc} T S: {vls,dnt,frc} U S: {smh,bck,rnd,vwl}

$ % & ' ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P

S: Ad Hoc S: Ad Hoc S: {low,fnt,unr,vwl} P: Primary stress Unused Unused S: {vcd,lav,flp} D: Ad Hoc P: Secondary stress D: Vowel: {unr} Cons: {syl} D: Vowel: {rnd} Cons: {rfx} P: Phonemic delimiter Unused P: Tone 1 P: Tone 2 P: Tone 3 P: Tone 4 Unused Unused Unused Unused Unused D: {lng} D: {pzd} P: Diacritic delimiter D: Ad Hoc P: Diacritic delimiter S: {glt,stp} <D>: {mrm} S: {mid,cnt,unr,vwl} S: {low,bck,unr,vwl} S: {vcd,blb,frc} S: {vls,pal,frc} S: {vcd,dnt,frc} S: {lmd,fnt,unr,vwl} Unused S: {vcd,uvl,stp} S: {vls,phr,frc} <D>: {fzd} S: {smh,fnt,unr,vwl} S: {vcd,pal,stp} Unused S: {vcd,vel,lat} S: {lbd,nas} S: {vel,nas} S: {lmd,bck,rnd,vwl} S: {vls,blb,frc}

V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

S: {lmd,bck,unr,vwl} S: {lmd,fnt,rnd,vwl} S: {vls,uvl,frc} S: {umd,fnt,rnd,vwl} S: {vcd,pla,frc} P: Phonetic delimiter D: {dnt} Unused P: Phonetic delimiter D: {pal} Unused D Voiced: {imp} Voiceless: {ejc} S: {low,cnt,unr,vwl} S: {vcd,blb,stp} S: {vls,pal,stp} S: {vcd,alv,stp} S: {umd,fnt,urd,vwl} S: {vls,lbd,frc} S: {vcd,vel,stp} S: {glt,apr} <D>: {asp} S: {hgh,fnt,unr,vwl} S: {pal,apr}/{vcd,pal,frc} <D>: {pzd} S: {vls,vel,stp} S: {vcd,alv,lat} S: {blb,nas} S: {alv,nas} S: {umd,bck,rnd,vwl} <D>: {unx} S: {vls,blb,stp} S: {vls,uvl,stp} S: {alv,apr} <D>: {rzd} S: {vls,alv,frc} S: {vls,alv,stp} S: {hgh,bck,rnd,vwl} S: {vcd,lbd,frc} S: {lbv,apr}/{vcd,lbv,frc} <D>: {lzd} S: {vls,vel,frc} S: {hgh,fnt,rnd,vwl} S: {vcd,alv,frc} P: Feature set delimiter Unused P: Feature set delimiter D: Cons: {vzd} Vowel: {nzd}

Summary of IPA/ASCII transcription for English


[Last Modified, 12 Mar 1993]

To aid English speakers in using the phonetic transcription, this document describes the mapping onto a standard American dictionary transcription system for sounds that commonly occur in the English language. When it differs from the symbol used, I've also included a description of the IPA symbol for the benefit of non-Americans. The table is taken from the 'Pronunciation Symbols' page of Merriam-Webster's New Collegiate Dictionary. In the examples, the letters which spell the sound are bracketed by '<...>'.

Note that this only describes a small subset of the transcription system. There are far more sounds (used in other languages) and nuances of sound that can be captured. See the document describing the full standard for complete details. Phonemic (broad) transcriptions are bracketed by '/.../'. Phonetic (narrow) transcriptions are bracketed by '[...]'. Syllables that carry primary stress are preceded by "'". Syllables that carry secondary stress are preceded by ",". When giving the transcription of a single word, spaces are generally inserted between syllables (often omitted before syllables that have stress marks). When giving the transcription of a multiword utterance, it is common to put spaces between words and omit them between syllables. /@/: schwa (upside-down 'e'). Used in both unaccented ('b<a>nan<a>', 'c<o>llide', '<a>but'), and accented ('h<u>mdr<u>m', 'ab<u>t') contexts. The IPA symbol is a schwa. [British speakers often have different vowels in these two contexts. The accented one is further back and is written /V/. Its IPA symbol is a 'wedge' or upside-down 'v'.] /l-/, /n-/, /m-/, /N-/: Superscript schwa preceding consonant. As in 'batt<le>', 'mitt<en>', 'eat<en>'. Signifies that the consonant is pronounced as a syllable by itself. The IPA symbol is a vertical bar below the consonant. /R/: shwa followed by 'r'. 'op<er>ation', 'f<ur>th<er>', '<ur>g<er>'. The IPA symbol is a schwa with a hook. /&/: short a. 'm<a>t', 'm<a>p', 'm<a>d', 'g<a>g, 'sn<a>p', 'p<a>tch'. The IPA symbol is an 'a-e' digraph. /eI/: long a ('a' with bar above). 'd<ay>', 'f<a>de', 'd<a>te', '<a>orta', 'dr<a>pe', 'c<a>pe'. /A/: a with diaeresis (two dots) above. 'b<o>ther', 'c<o>t', and, with most American speakers, 'f<a>ther', 'c<a>rt'. The IPA symbol is a script 'a'. /a/: a with dot above. 'f<a>ther' as pronounced by speakers who do not rhyme it with bother. /AU/: a followed by u with dot. 'n<ow>', 'l<ou>d', '<ou>t'.

/b/: '<b>a<b>y', 'ri<b>'. /tS/: ch. The dictionary notes "(actually, this sound is \t\ + \sh\)" '<ch>in', 'na<tu>re' (/'neI tSR/). In IPA transcription, this is sometimes spelled as 'c with hacek'. /d/: '<d>i<d>', 'a<dd>er'. /E/: short e. 'b<e>t', 'b<e>d', 'p<e>ck'. The IPA symbol is a lower-case epsilon. It is sometimes spelled with a small capital E. /i/: long e ('e' with bar above). 'b<ea>t', 'nosebl<ee>d', '<e>venl<y>', '<ea>s<y>'. /f/: '<f>i<f>ty', 'cu<ff>' /g/: '<g>o', 'bi<g>', '<g>ift'. /h/: '<h>at', 'a<h>ead'. /hw/: '<wh>ale' as pronounced by those who do not have the same pronunciation for both 'whale' and 'wail'. /I/: short i. 't<i>p', 'ban<i>sh', 'act<i>ve'. The IPA symbol is a small capital I or a lower-case iota. /aI/: long i ('i' with bar above). 's<i>te', 's<i>de', 'b<uy>', 'tr<i>pe'. /dZ/: j. The dictionary notes "(actually, this sound is \d\ + \zh\)" '<j>ob', '<g>em', 'e<dge>', '<j>oin', '<j>u<dge'. /k/: '<k>in', '<c>oo<k>', 'a<che>'. /x/: k with bar below. (Same as /C/.) German 'Bu<ch>'. /C/: k with bar below. (Same as /x/.) German 'i<ch>'. /l/: '<l>i<l>y', 'poo<l>'. /m/: '<m>ur<m>ur', 'di<m>', 'ny<m>ph'. /n/: '<n>o', 'ow<n>'. /<vowel>~/: superscript 'n'. "indicates that a preceeding vowel or diphthong is pronounced with the nasal passages open as in French 'un bon vin blanc' /W~ bo~ va~ blA~/"

The IPA diacritic is a tilde above the vowel. /N/: eng ('n' with a tail). 'si<ng>' /sIN/, 'si<ng>er' /'sIN R/, 'fi<ng>er' /'fIN gR/, 'i<n>k' /iNk/ The IPA symbol is an eng. /oU/: long o ('o' with bar above). 'b<o>ne', 'kn<ow>', 'b<eau>'. /O/: 'o' with dot above. 's<aw>', '<a>ll', 'gn<aw>'. The IPA symbol is a small open 'o' or upside-down 'c'. /W/: o-e digraph French 'b<oeu>f', german 'H<o:>lle. The IPA symbol is an o-e digraph. /Oi/: 'o' with dot above followed by 'i'. 'c<oi>n', 'destr<oy>'. [The dictionary also lists 's<awi>ng', but I pronounce that as two separate syllables /'sO IN/.] /p/: '<p>e<pp>er', 'li<p>'. /r/: '<r>ed', 'ca<r>', '<r>a<r>ity'. /s/: '<s>our<ce>', 'le<ss>'. /S/: sh. '<sh>y', 'mi<ssi>on', 'ma<ch>ine', 'spe<ci>al'. The IPA symbol is an esh: a tall, pulled 's' or long, barless 'f'. /t/: '<t>ie', 'a<tt>ack'. /T/: th. '<th>in'. 'e<th>er'. The IPA symbol as a lower-case theta. /D/: 'th' with bar below. '<th>en', 'ei<th>er', '<th>is'. The IPA symbol is an eth, sort of a script 'd' with the bar crossed.

/u/: 'u' with diaeresis (two dots) above. 'r<u>le', 'y<ou>th', 'union' /'jun j@n/, 'few' /fju/. /U/: 'u' with dot above. 'p<u>ll', 'w<oo>d', 'b<oo>k', 'curable' /'kjUr @ b@l/. The IPA symbol is a small letter upsilon. A small capital U or closed lower-case omega is also used. /y/: u-e digraph. German 'f<u:>llen', 'h<u:>bsch', French 'r<ue>'. /v/: '<v>i<v>id', 'gi<ve>'. /w/: '<w>e', 'a<w>ay'. /j/: '<y>ard', '<y>oung', 'cue' /kju/, 'union' /'jun y@n/; /<cons>;/: superscript 'y' following consonant; "indicates that during the articulation of the sound represented by the preceding character, the front of the tongue has substantially the position it has for the articulation of the first sound of 'yard', as in French 'digne' /din;/." The IPA diacritic is a superscript 'j' following or hook below the consonant. /ju/: '<you>th', '<u>nion', 'c<ue>', 'f<ew>', 'm<u>te'. /jU/: 'c<u>rable', 'f<u>ry'. /z/: '<z>one', 'rai<se>'. /Z/: zh. 'vi<si>on', 'azure' /'aZ R/. The IPA symbol is a yogh: like a flat-topped '3' lowered so that the top is the height of that of a 'z'.

SAMPA - computer readable phonetic alphabet

SAMPA (Speech Assessment Methods Phonetic Alphabet) is a machine-readable phonetic alphabet. It was originally developed under the ESPRIT project 1541, SAM (Speech Assessment Methods) in 1987-89 by an international group of phoneticians, and was applied in the first instance to the European Communities languages Danish,Dutch, English, French, German, and Italian (by 1989); later to Norwegian and Swedish (by 1992); and subsequently to Greek, Portuguese, and Spanish (1993). Under the BABEL project, it has now been extended to Bulgarian, Estonian, Hungarian, Polish, and Romanian (1996). Under the aegis of COCOSDA it is hoped to extend it to cover many other languages (and in principle all languages). On the initiative of the OrienTel project, Arabic, Hebrew, and Turkish have been added. Other recent additions: Cantonese, Croatian, Czech, Russian, Slovenian, Thai. Coming shortly: Japanese, Korean. Where Unicode (ISO 10646) is not available or not appropriate, SAMPA and the proposed XSAMPA (Extended SAMPA) constitute the best robust international collaborative basis for a standard machine-readable encoding of phonetic notation.

Note about Unicode: Recent version of the Internet Explorer and Netscape browsers are capable
of handling WGL4, the subset of Unicode needed for the orthography of all the languages of Europe. Test yours by looking at this page, or download an up-to-date browser and a WGL4 font. Unicode SAMPA pages are now available with correct local orthography, for those with this capacity, for Bulgarian, Czech, Greek, Hungarian, Polish, Romanian, and Slovenian. See if your browser can cope with Unicode IPA symbols by looking at this special version of the English SAMPA page. For IPA in Unicode, see here. SAMPA basically consists of a mapping of symbols of the International Phonetic Alphabet onto ASCII codes in the range 33..127, the 7-bit printable ASCII characters. Associated with the coding (mapping) are guidelines for the transcription of the languages to which SAMPA has been applied. Unlike other proposals for mapping the IPA onto ASCII, SAMPA is not one single author's scheme, but represents the outcome of collaboration and consultation among speech researchers in many different countries. The SAMPA transcription symbols have been developed by or in consultation with native speakers of every language to which they have been applied, but are standardized internationally.

A SAMPA transcription is designed to be uniquely parsable. As with the ordinary IPA, a string of SAMPA symbols does not require spaces between successive symbols. SAMPA has been applied not only by the SAM partners collaborating on EUROM 1, but also in other speech research projects (e.g. BABEL, Onomastica, OrienTel) and by Oxford University Press. It is included among the resources listed by the Linguistic Data Consortium. In its basic form SAMPA was seen as catering essentially for segmental transcription, particularly of a traditional phonemic or near-phonemic kind. Prosodic notation was not adequately developed. This shortcoming has now been remedied by a proposed parallel system of prosodic notation, SAMPROSA. It is important that prosodic and segmental transcriptions be kept distinct from one another, on separate representational tiers (because certain symbols have different meanings in SAMPROSA from their meaning in SAMPA: e.g. H denotes a labial-palatal semivowel in SAMPA, but High tone in SAMPROSA). A proposal for an extended version of the segmental alphabet, X-SAMPA, extends the basic agreed conventions so as to make provision for every symbol on the Chart of the International Phonetic Association, including all diacritics. In principle this makes it possible to produce a machine-readable phonetic transcription for every known human language. The present SAMPA recommendations (as devised for the basic six languages) are set out in the following table. All IPA symbols that coincide with lower-case letters of the Latin alphabet remain the same; all other symbols are recoded within the ASCII range 37..126. In this current WWW document the IPA symbols cannot be shown, but the columns indicate respectively a SAMPA symbol, its ASCII/ANSI number decimal), the shape of the corresponding IPA symbol, the Unicode number (hex, decimal) for the IPA symbol, and the symbol's meaning or use.

SAMPA symbol ASCII A { 6 Q E @ 3 I 65 123 54 81 69 64 51 73

IPA

Unicode hex dec. Vowels

label and exemplification

script a ae ligature turned script a

0251 593 open back unrounded, Cardinal 5, Eng. start 00E6 230 near-open front unrounded, Eng. trap

turned a 0250 592 open schwa, Ger. besser 0252 594 open back rounded, Eng. lot 025B 603 open-mid front unrounded, C3, Fr. mme

epsilon rev. epsilon

turned e 0259 601 schwa, Eng. banana 025C 604 long mid central, Eng. nurse

small cap 026A 618 lax close front unrounded, Eng. kit I

O 2 9 & U } V Y

79 50 57 38 85 125 86 89

turned c o-slash oe ligature s.c. OE lig.

0254 596 open-mid back rounded, Eng. thought 00F8 248 close-mid front rounded, Fr. deux 0153 339 open-mid front rounded, Fr. neuf 0276 630 open front rounded 028A 650 lax close back rounded, Eng. foot

upsilon

barred u 0289 649 close central rounded, Swedish sju turned v 028C 652 open-mid back unrounded, Eng. strut small cap 028F 655 lax [y], Ger. hbsch Y Consonants

B C D G L J N R S T H Z ?

66 67 68 71 76 74 78 82 83 84 72 90 63

beta c-cedilla eth gamma

03B2 946 voiced bilabial fricative, Sp. cabo 00E7 231 voiceless palatal fricative, Ger. ich 00F0 240 voiced dental fricative, Eng. then 0263 611 voiced velar fricative, Sp. fuego

turned y 028E 654 palatal lateral, It. famiglia left-tail n 0272 626 palatal nasal, Sp. ao eng inv. s.c. R esh 014B 331 velar nasal, Eng. thing 0281 641 vd. uvular fric. or trill, Fr. roi 0283 643 voiceless palatoalveolar fricative, Eng. ship 03B8 952 voiceless dental fricative, Eng. thin

theta ezh (yogh)

turned h 0265 613 labial-palatal semivowel, Fr. huit 0292 658 vd. palatoalveolar fric., Eng. measure

dotless ? 0294 660 glottal stop, Ger. Verein, also Danish std Length, stress and tone marks

: " % ` '

58 34 37 96 39

length mark vertical stroke

02D0 720 length mark 02C8 712 primary stress *

low vert. 02CC 716 secondary stress str. (see note 1) (see note 1) falling tone rising tone

Note 1: The SAMPA tone mark recommendations were based on the IPA as it was up to 1989-90. Since then, however, the IPA has changed its symbols for falling and rising tones. These SAMPA tone marks may now be considered obsolete, having in practice been superseded by the SAMPROSA proposals. Diacritics (shown with another symbol as an example) =n O~ 60 126 inf. stroke 0329 809 syllabic consonant, Eng. garden (see note 2)

sup. tilde 0303 771 nasalization, Fr. bon

Note 2: At the time SAMPA was established it was assumed that the syllabicity diacritic should precede the base character. More recently, ISO and Unicode have established that all diacritics should follow the base character, and this principle should be applied in future work.
The phonemic notation of individual languages These pages provide a brief outline of the phonemic distinctions in various languages:Arabic, Bulgarian, Cantonese, Czech,Croatian,Danish,Dutch, English, Estonian, French, German, Greek, Hebrew, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian,Russian,Spanish, Swedish, Thai, Turkish. Extensions These pages provide extensions of the basic segmental SAMPA: SAMPROSA (prosodic), X-SAMPA (other symbols, mainly segmental). A utility: Instant IPA in Word - converts SAMPA to IPA. To refer to SAMPA cite this website (www.phon.ucl.ac.uk/home/sampa) or the printed version [Wells, J.C.], 1997. 'SAMPA computer readable phonetic alphabet'. In Gibbon, D., Moore, R. and Winski, R. (eds.), 1997. Handbook of Standards and Resources for Spoken Language Systems. Berlin and New York: Mouton de Gruyter. Part IV, section B. For queries please contact:John Wells by e-mail or at

Department of Speech, Hearing and Phonetic Sciences, University College London, Gower Street, London WC1E 6BT.
Last revised 2005 October 25

Phonetic Symbols, Keyboards and Transcription


This page provides some information and resources for the use of the International Phonetics Association alphabet of phonetic symbols for the transcription of speech using computers.

Unicode Phonetic Keyboard and SIL Fonts


The Unicode Phonetic Keyboard is an installable keyboard for Windows PCs that provides a convenient keyboard layout for the word-processing of phonetic transcription using Unicode fonts. The installation package comes complete with two Unicode fonts: Doulos and Charis that have been developed by SIL.

Guide to the use of Unicode Phonetic Symbols


The IPA Symbols in Unicode Guide to Using Unicode Phonetic Symbols

Download Unicode Phonetic Keyboard 1.02 (self-install executable, 2MB) Download Unicode Phonetic Keyboard 1.10 (Vista/Vista64) (self-install executable, 2MB) Download Keyboard Layout (PDF)

John Wells has written a number of pages which give more information about the set of phonetic symbols available in Unicode, and about how these can be used in Microsoft Word and other applications:

IPA-SAM Phonetic Fonts

The IPA-SAM fonts are a set of TrueType fonts (not Unicode) suitable for Windows and MacOS that include all current IPA symbols. The keyboard layout is designed to be compatible with SAMPA.

SAMPASAM Phonetic Alphabet


SAMPA is a phonetic transcription coding that uses normal ASCII characters as replacements for IPA symbols. It has been designed for phonemic/broad phonetic transcription of European languages.

Spoken Phonetic Transcription


The PHONWEB page provides a facility to read out SAMPA phonetic transcription for English in synthetic speech!

The International Phonetic Alphabet in Unicode

Displaying IPA symbols

Inserting IPA symbols in web documents

Unicode decimal numbers for IPA symbols

Displaying IPA symbols


For you to be able to display Unicode phonetic symbols correctly on your web browser, the browser must be Unicode-compliant (all current browsers are) you must be running Windows 95 or later, or, on a Macintosh, OSX; (otherwise, and for Unix or Linux, see advice from the Unicode site) you must have installed a Unicode font that includes the IPA symbols The list of such Windows TrueType/OpenType fonts currently available and that I recommend is as follows. Arial with Windows Vista; not previous versions Arial Unicode MS Charis SIL (download) an excellent font from SIL Courier New with Windows Vista; not previous versions Doulos SIL (download) - the familiar SIL Doulos font, now in comprehensive Unicode version Gentium (download) Lucida Grande with Mac OS X/Safari and later Lucida Sans Unicode, (download) blocky, but widely available (supplied with many versions of Windows) Microsoft Sans Serif with Windows Vista and later; not previous versions Segoe UI with Windows Vista and later Tahoma with Windows Vista and later; not previous versions Times New Roman with Windows Vista and later; not previous versions and the web document you are displaying must specify this font with either a style sheet {fontfamily} tag or an in-line <font face> tag. The style sheet in the head of this document specifies the font Arial MS Unicode, or failing that Lucida Sans

Unicode. There is also another version, with no font specified, that you can use to test fonts.

Inserting IPA symbols in web documents


There are several ways to insert Unicode IPA symbols into your HTML files: by using MS Word (97 and later), or by using a numeric code.

In Word, with a Unicode font selected, useInsert | Symbol (normal text) and scroll down the box until you find the character you want. Select it, and Insert. With Word 2003 and later, you can alternatively type in the Unicode hex number (see below), select it, and do Alt-X. The character will appear. If you are going to use the character frequently, it might be worthwhile assigning a Shortcut Key (macro) for it. You can also use the program Character Map to find your character, then select, copy and paste it. Or you can use a keyboard facility such as this.

Afterwards, save the document using File | Save as HTML. Word will automatically convert the character into the corresponding numeric entity (see next para) or the corresponding UTF-8 encoding. Alternatively, write direct HTML, referencing each IPA symbol using the code numbers listed below. You can do this using either decimal or hex numbers. To create such a "numeric entity", you put ampersand (&), number sign (#), the Unicode number for the symbol, and semicolon. If using hex numbers, you must place an x between the number sign and the number. For example, to include the velar nasal symbol, ,which has the Unicode decimal number 331, write &#331;, or, since its hex number is 014B, you can alternatively write &#x014B;. To transcribe the English word thing, , write &#952;&#618;&#331; or, alternatively, &#x03B8;&#x026A;&#x014B;. The browser will render these with the correct IPA symbols, always provided an appropriate font is available. Force the use of an appropriate font by including a font tag as mentioned above, for example in your cascading style sheet, p {font-family:"lucida sans unicode";}, or in the text, an in-line tag <font face="Lucida Sans Unicode">.

Unicode decimal and hex numbers for IPA symbols


The Unicode manual lists code numbers only in hexadecimal. Here they are listed with their decimal numbers as well. Alphabetic | Spacing diacritics | Non-spacing diacritics | Arrows Alphabetic (excluding the standard characters a-z)

Symbol decimal hex 593 0251 592 0250 594 0252 230 00E6 595 0253 665 0299 946 03B2 596 0254 597 0255 231 00E7 599 0257 598 0256 240 00F0 676 02A4 601 0259 600 0258 602 025A 603 025B 604 025C 605 025D 606 025E 607 025F 644 0284

value open back unrounded open-mid schwa open back rounded raised open front unrounded vd bilabial implosive vd bilabial trill vd bilabial fricative open-mid back rounded vl alveolopalatal fricative vl palatal fricative vd alveolar implosive vd retroflex plosive vd dental fricative vd postalveolar affricate schwa close-mid schwa rhotacized schwa open-mid front unrounded open-mid central rhotacized open-mid central open-mid central rounded vd palatal plosive vd palatal implosive

609 608 610 667 614 615 295 613 668 616 618 669 621 620 619 622 671 625 623 624 331 627 626 628 248 629 632 952 339 630 664 633 634 638 635 640 641 637 642 643 648 679

0261 0260 0262 029B 0266 0267 0127 0265 029C 0268 026A 029D 026D 026C 026B 026E 029F 0271 026F 0270 014B 0273 0272 0274 00F8 0275 0278 03B8 0153 0276 0298 0279 027A 027E 027B 0280 0281 027D 0282 0283 0288 02A7

vd velar plosive (but the IPA has ruled that an ordinary g is also acceptable) vd velar implosive vd uvular plosive vd uvular implosive vd glottal fricative vl multiple-place fricative vl pharyngeal fricative labial-palatal approximant vl epiglottal fricative close central unrounded lax close front unrounded vd palatal fricative vd retroflex lateral vl alveolar lateral fricative velarized vd alveolar lateral vd alveolar lateral fricative vd velar lateral vd labiodental nasal close back unrounded velar approximant vd velar nasal vd retroflex nasal vd palatal nasal vd uvular nasal front close-mid rounded rounded schwa vl bilabial fricative vl dental fricative front open-mid rounded front open rounded bilabial click vd (post)alveolar approximant vd alveolar lateral flap vd alveolar tap vd retroflex approximant vd uvular trill vd uvular fricative vd retroflex flap vl retroflex fricative vl postalveolar fricative vl retroflex plosive vl postalveolar affricate


Top of lists

649 650 651 11377 652 611 612 653 967 654 655 657 656 658 660 673 661 674 448 449 450 451

0289 028A 028B 2C71 028C 0263 0264 028D 03C7 028E 028F 0291 0290 0292 0294 02A1 0295 02A2 01C0 01C1 01C2 01C3

close central rounded lax close back rounded vd labiodental approximant voiced labiodental flap open-mid back unrounded vd velar fricative close-mid back unrounded vl labial-velar fricative vl uvular fricative vd palatal lateral lax close front rounded vd alveolopalatal fricative vd retroflex fricative vd postalveolar fricative glottal plosive vd epiglottal plosive vd pharyngeal fricative vd epiglottal fricative dental click alveolar lateral click alveolar click retroflex click

Spacing diacritics and suprasegmentals To study these, you may find it helpful to set your browser text size to Largest.

Symbol decimal hex 712 02C8 716 02CC 720 721 700 692 688 689 690 695 736 740 734 02D0 02D1 02BC 02B4 02B0 02B1 02B2 02B7 02E0 02E4 02DE

value (primary) stress mark secondary stress length mark NB: there is a bug in some versions of MS IExplorer that causes this character not to display. It is probably best to use a simple colon instead. half-length ejective rhotacized aspirated breathy-voice-aspirated palatalized labialized velarized pharyngealized rhotacized

Note the ready-made characters 602 025A (combining 601 0259 and 734 02DE) and 605 025D (combining 604 025C and 734 02DE).

Non-spacing diacritics and suprasegmentals As you can see, several of these are unsatisfactory, particularly in smaller sizes. They are shown here with an appropriate supporting base character. When composing a text in HTML, enter the diacritic after the base character th s ( oice ess n n) n&#805;. The browser automatically backspaces the diacritic, but by a constant amount, which may or may not produce a satisfactory result.

Symbol decimal hex nd 805 0325 778 030A ba 804 0324 td 810 032A st 812 032C ba 816 0330 td 826 033A td 828 033C td 827 033B t 794 031A 825 0339 771 0303 796 031C u 799 031F e 800 0320 e 776 0308 ln 820 0334 619 026B e 829 033D e 797 031D mnl 809 0329 e 798 031E e 815 032F e 792 0318 e 793 0319 e 774 0306 e 779 030B 769 0301 e 772 0304 768 0300 e 783 030F xx 860 035C xx 865 0361
Arrows

value voiceless voiceless (use if character has descender) breathy voiced dental voiced creaky voiced apical linguolabial laminal not audibly released more rounded nasalized less rounded advanced retracted centralized velarized or pharyngealized
(ready-made combination, dark l)

mid-centralized raised syllabic lowered non-syllabic advanced tongue root retracted tongue root extra-short extra high tone high tone mid tone low tone extra low tone tie bar below tie bar above value downstep upstep

Symbol decimal hex 8595 2193 8593 2191

8594 8599 8600

2192 2197 2198

(becomes, is realized as not recognized by the IPA) global rise global fall

For a much more thorough discussion of displaying and using Unicode characters, see Alan Wood's Unicode resources.

Put phonetic symbols in your Worddocument


This page is intended to help you get phonetic symbols into a word-processed document. It is intended primarily for people running Word 97 or later under Windows 98 or later, using the Unicode phonetic symbol font routinely supplied by Microsoft. For other Office applications - Excel, Access, Powerpoint, Outlook, Outlook Express - see below.

1. Check that you have a phonetic font available.


On UCL cluster installations, this means the font Lucida Sans Unicode. Open a blank document in Word. Click on the font box and select Lucida Sans Unicode from the drop-down menu. If you haven't already got this font installed, download it and install it. Alternatively, you may prefer to use some other Unicode phonetic font, e.g. Arial Unicode MS, Charis SIL (download), Doulos SIL (download) or Gentium (download).

< before and ^ after

NB. Most other fonts do not include phonetic symbols.

2. Find the phonetic symbol you want


There are several ways to do this. Choose the way you find easiest. 1. copy and paste 2. Insert | Symbol 3. Character Map 4. AutoCorrect ("Eureka")

5. (Word 2002) Alt-x 6. Phonetic Keyboard

(a) Copy and paste from the symbols below.


Find the symbol you need in the table below, then copy (Ctrl-C) and paste (Ctrl-V) it into your document. (You could do View | Text Size | Larger to see the symbols more easily.)

Symbols essential for English phonetics

Length marks, stress marks, diacritics


(put diacritics AFTER the letter they go with)

Syllabic, devoiced

Other useful symbols

Useful letter-plus-diacritic combinations t d n r l

Advantages: Straightforward. Disadvantages: You need to keep this page on-screen. Some symbols are not shown here. (b) Do Insert | Symbol,
and find the symbol in the drop-down box that appears.

Advantages: Easy. You can even define a Shortcut Key for a symbol you need a lot. Disadvantages: Fiddly. The on-screen symbols are very small. Some diacritics are too small to
distinguish. (This criticism does not apply to Word 2002, where the drop-down box is much improved.)

(c) Use the programme Character Map,


which you can launch from Start | Programs | Accessories | System Tools | Character Map. Remember to select the font Lucida Sans Unicode.

Advantages: Gives a label for each character, helping you to be sure you've got the right one. Disadvantages: Even more fiddly. Doesnt work in older versions of Windows. (d) Read the articles Eureka and Eureka-IPA
and create AutoCorrect shortcuts as described there.

^ Insert | Symbol ... AutoCorrect >

Advantages: Excellent if you need to use each phonetic symbol several times. Builds on (b) above. Disadvantages: Takes some time to set up. (e) In Word 2002, type the symbol's Unicode number and do Alt-x.
The Unicode number must be in hexadecimal form; e.g. the number for the velar nasal is 014B. (A complete list of the hex numbers of phonetic symbols.)

Advantages: easy, if you know the number. Disadvantages: you need to know the number. Does not work with previous versions of Word. (f) Use a phonetic keyboard.
You can install a virtual keyboard allowing you to access phonetic symbols by using the ordinary keys. I recommend Mark Huckvales Unicode Phonetic Keyboard, which you can download free from here. (Windows PC only.)

Advantages: easy, using the chart supplied; does not require knowledge of Unicode numbers. Disadvantages: you have to toggle into and out of the special keyboard.

Other points to watch


Lucida Sans Unicode is a big font. If you mix it with other fonts it may disturb the line height. So if you want the rest to be in Times New Roman size 12, for example, set the Lucida Sans Unicode to size 10. With Times 10, use Lucida 8. In Word, you should disable certain AutoCorrect functions that will otherwise automatically make undesirable symbol changes, notably the "correcting" of i to I (since you may want to use the phonetic symbol i on its own). Be careful over confusingly similar phonetic symbols.

For other Office applications


...the best way seems to be

to use method 2(a) above (for a single symbol)

to create a Word document with the string of symbols you want, and then copy (ctrlC) and paste (ctrl-V) them into Excel/Powerpoint etc.; or to use a virtual keyboard.

This works with any Unicode-enabled application, but not of course with those that are not Unicode-compliant.

Assignment of ASCII Characters to IPA Symbols


IPA stands for International Phonetic Alphabet. Often, it is useful (or even necessary) to represent IPA symbols by ASCII characters. The following table proposes an assignment of ASCII characters to IPA symbols, such that the shape of the ASCII character corresponds most naturally to the shape of the IPA symbol (e.g., ASCII L for IPA L). Wherever this is impossible, other principles that have been followed by the author are: the frequency of the sound in various languages (e.g., ASCII R is assigned to IPA alveolar trill r rather than uvular trill R, while ASCII r is assigned to the alveolar approximant the English r), and that a single ASCII character should represent each symbol in the two main tables of consonants and vowels. Legend: IPA symbol D
ASCII character

In the table of pulmonic consonants, below, the first symbol within a cell denotes an unvoiced sound (e.g., t), while the second symbol denotes the corresponding voiced sound (e.g., d).

Pulmoni Labiode Alveola Postalve Retrofl c Bilabial Dental Palatal ntal r olar ex Consona nts Plosive p Nasal Trill Tap or Flap Fricative F Lateral fricative Approxi mant Lateral approxi mant B f v T D s 4 V b m } M t d n R P z 5 r { | j K S Z $ [ 2 C J x \ / % c ] # k

Velar

Uvular

Pharyng Glottal eal ?

g q N

G 7 8

+ X

Q H

9 h

W L

In the table of vowels, below, wherever symbols appear in pairs, the leftmost symbol of the pair denotes an unrounded vowel, while the rightmost symbol denotes the corresponding rounded vowel. Vowels Close Front i y I Close-mid e 0 Y ) =
@

Central 1

Back w U u

,
; * ^
A

Open-mid

~ &

Open

<

The above two tables span the range of most common sounds (pulmonic consonants and vowels). There are a few remaining ASCII characters (., !, :, ', >, _ and `), and a number of sounds (non-pulmonic consonants, affricates) and symbols that are not included in the above tables. We suggest using some of the remaining ASCII characters as starting tokens of "escape sequences" to represent the remaining IPA symbols.

The following table suggests escape sequences (all starting with the ! symbol) for non-pulmonic consonants: Non-pulmonic Consonants Clicks ASCII IPA sequence symbol description !0 !| !! != !# bilabial dental (post)alveolar palatoalveolar alveolar lateral

Voiced Implosives ASCII IPA sequence symbol description !b !d !f !g !G bilabial dental/alveolar palatal velar uvular

Ejectives ASCII IPA sequence symbol description !' !p !t !k !s bilabial dental/alveolar velar alveolar/fricative

The following table suggests escape sequences (all starting with the . symbol) for the affricates, other double articulations, and other symbols: Affricates ASCII IPA sequenc symbol e s Other symbols ASCII IPA sequenc symbo e l description voiceless labio-velar .m fricative voiced labio-velar .w approximan t voiced labiopalatal .h approximan t voiceless epiglottal .H fricative Other symbols Other symbols ASCII IPA ASCII IPA sequenc symbo descriptio sequenc symbo e l n e l description voiced alveolar epiglottal .9 .I lateral flap fricative

.s

.S

.?

epiglottal plosive

.X

simultaneou s
and

.z

.c

alveolopalatal fricative alveolopalatal fricative

'

primary stress

.Z

.7

secondary stress

You may also want to take a look at this page, which describes various other conventions, considered moreor-less "standard". A related page of mine is: Greek Sounds in the International Phonetic Alphabet Back to the Index of Topics in Language

You might also like