Annex F - Pador Off-Line Form

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/228856053
English to Urdu transliteration system
Article
CITATIONS READS
9 5,879
2 authors:
Abbas Raza Ali Madiha Ijaz

Bournemouth University IBM
5 PUBLICATIONS 53 CITATIONS 3 PUBLICATIONS 52 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Urdu Speech Interface for Visually Impaired View project
All content following this page was uploaded by Abbas Raza Ali on 13 November 2014.
The user has requested enhancement of the downloaded file.

Proceedings of the Conference on Language & Technology 2009
English to Urdu Transliteration System
Abbas Raza Ali and Madiha Ijaz

Center for Research in Urdu Language Processing,
National University of Computer and Emerging Sciences, Lahore, Pakistan
{abbas.raza, madiha.ijaz}@nu.edu.pk
Abstract English text discards it and as a result generated

speech or translation lacks coherence.
Urdu language processing applications encounter English to Urdu transliteration system is being
non-Urdu text specifically English text frequently. The developed to eradicate this discrepancy as shown in
accuracy of these systems e.g. machine translation, Table 1.
text-to-speech etc. is highly undermined as they are
unable to handle English text. One possibility could be Table 1: Effect of transliteration on Urdu TTS
addition of multilingual language processing system
capabilities in Urdu language processing applications
Alex
Nokia ð‫ا‬
so that they may handle English text also along with Urdu Text
Urdu but this approach is quite taxing. Another ö ‫ ﻼز‬
َ
‫ ا
ﻼز‬
approach to handle English text is to transliterate it With
ð‫ا‬
into Urdu automatically and then pass it on to the Transliterati
Urdu language processing applications. on ö
This paper describes English to Urdu
Without
transliteration system. First the mapping rules that are Transliterati ö ‫
"م‬
ð‫ا‬
used to generate Urdu text from English transcription on
are discussed then syllabification, manual
transliteration and Urduization phase is described and In order to develop English to Urdu transliteration
finally the issues related to Out-Of-Vocabulary (OOV) system, first the rule-based approach employing
are discussed. transliteration from English orthography to Urdu
orthography was explored but soon it was realized
1. Introduction that it would not work well as there is no one-to-one
mapping between English orthography and its
Transliteration is a method of transcribing the corresponding sound e.g. /ʃ/ sound is represented
words from one script to phonetically equivalent words using six different letter combinations i.e. motion
in another script. Transliteration rules provide
/moʊ.ʃən/, ocean /oʊ.ʃən/, sure /ʃʊ.ər/, she /ʃi/,
mapping for the letters of the source script alphabet to
the letter of target script alphabet based on phonetic admission /æɖ.mɪ.ʃən/, machine /mə.ʃin/ etc [2]. So
similarity. This process is very successful for pronunciation based transliteration was chosen as it
transliteration of names of people, places, companies, produced better results.
etc., because the translated dictionaries can never be An Arpabet based English pronunciation lexicon
comprehensive and are ineffective for translation of is used for acquiring pronunciation of English words.
proper noun [1]. English text is converted to Urdu using English
Websites, user interfaces etc. contain a lot of pronunciation and mapping rules. The English
English text along with Urdu text. The text is read by pronunciation lexicon is based on American accent,
the applications like screen reader, web page reader hence the transliteration into Urdu also depicts
etc. and is passed to the Urdu language processing American accent. Frequently used English words are
application e.g. machine translation for translation or transliterated manually and some rules are applied for
text-to-speech system for speech generation. Urdu TTS Urduization of the transliterated text in order to make
or machine translation system being unable to handle it appropriate and as close as possible to the local
15
accent i.e. the accent that is used in Pakistan while Some sounds have multiple realizations in Urdu
speaking English. orthography e.g. /s/ can be realized as ‫ ث‬،‫ س‬،‫ ص‬etc
Out-Of-Vocabulary problem is resolved using
statistical techniques by first aligning English so in this case only one most commonly used alphabet
orthography to pronunciation sequences. Optimal is chosen which is ‫ س‬in this case. Similar is the case
pronunciation of an unknown word is computed by
picking maximum probable pronunciation and then with /z/ as it can be realized as ‫ ظ‬،‫ ذ‬،‫ ز‬،‫ ض‬and /t ̪/
passing it for the same transliteration process.
The architecture of the English to Urdu which can be realized either as ‫ ت‬or ‫ط‬.
Transliteration system is shown in figure 1. Vowels in Urdu are represented using diacritics
i.e. zair, zaber and paish and four letters alif, wao,
English Text choti yeh and bari yeh. Combination of diacritics with
consonants forms short vowels while diacritics
Converted to Load
combined with alif, wao, choti yeh and bari yeh, form
transliteration
and language long vowels [3]. Same vowel is represented
English OOV
model differently in orthography, depending on whether it
applying Computing exists word initially, medially or finally.
Short vowels occurring word initially use alif as
َ َ
place holder e.g. “urban” is transliterated to +, ‫ار‬
Syllabification Optimal
Pronunciation
applying Sequence /’ər.bən/ but when they occur word medially they are
represented only by the diacritics e.g. “justly” is
Urduization َ
ِ ð
transliterated to .012 , /’ʤəsʈ.li/.
Converted
Short vowels when occur word finally are
Urdu Script transformed into their corresponding long vowel i.e.
zabar is converted to alif e.g. “Andorra” /Æ n d ɔ ɹ ʌ/
َ َ َ
is transliterated to ‫ورا‬45 ‫ ا‬/æn.’ɖɔ.rɑ/, similarly zair is
Figure 1: Architecture of English-to-Urdu
transliteration converted to choti yeh and pesh is converted to wao.
Hence there is no one-to-one correspondence
between English and Urdu vowels in most of the
2. English to Urdu mapping
cases and an English vowel is transliterated using
multiple Urdu characters depending on whether it
CMU pronouncing dictionary (v 0.7a) is used to
occurs word initially, medially or finally as shown
acquire pronunciation of English words. The
table 2.
dictionary comprises of 125,000 English words and
Table 2: English vowels mapping to Urdu
their corresponding transcription in Arpabet. The
orthography
pronunciation provided is based on American accent
[11]. Urdu
Arpabet IPA
The phonemic inventory of English comprises of Initial Middle Final
َ َ
24 consonants and 15 vowels. The phonemic inventory AA ɑ ‫آ‬ ‫◌ا‬ ‫◌ا‬
of Urdu comprises of 37 consonants and 16 vowels َ َ
(Appendix B). English consonants can be easily AE Æ ‫اى‬ ‫◌ى‬ ‫ے‬
mapped to Urdu consonants and there is one-to-one َ َ
correspondence between them in all cases. There are AY Aɪ .:ِ ‫آ‬ ◌;‫◌ ِا‬ .: ‫◌ا‬
some sounds in English e.g. dental fricatives, /Θ/ and َ َ
AW Aʊ ‫آؤ‬ ‫◌اؤ‬ ‫◌اؤ‬
/Ð/ which are non-existent in Urdu and hence they are َ َ َ
AO ɔ ‫او‬ ‫◌و‬ ‫◌و‬
mapped to their closest counterpart i.e. dental stops /t ̪ʰ/
and /d̪/ respectively. OY ɔɪ ◌;‫آ‬
ِ ◌;‫وا‬
ِ =‫ا‬
َ َ
EH ɛ ‫اى‬ ‫◌ى‬ ‫ے‬
16
َ َ َ َ َ َ
ɝ ‫ار‬ ‫◌ر‬ ‫◌ر‬ ʌ . s oʊ . ʃ i .
?@A‫ا‬ ?‫ ا‬BCA‫ا‬

ER Associate
ʌt
Eɪ ‫اى‬ ‫ى‬ ‫ے‬ َ
‫ وى‬.â ‫اب‬
EY
َ َ
ʌb.lɪ.vi. ِ
‫ِا‬ ◌ِ ◌ِ
Oblivious D Eِ , ‫ا‬ َ
IH ɪ ʌs ‫اس‬
I ‫ِاى‬ ‫ِ◌ى‬ ‫ِ◌ى‬ ‫ ِڈى‬.I, ‫او‬
َ
IY
oʊ . b i . d i . ِ َ
Obedient , ‫او‬
?54
OW Oʊ ‫او‬ ‫و‬ ‫و‬ ʌnt ?J‫ا‬
ُ ُ ُ
UH ʊ ‫ا‬ ◌ ◌
َ َ َ
AH ʌ ‫ا‬ ◌ ‫◌ا‬ 3.2. Special case
After applying syllabification there exists

problem of local accent, as transliteration is based on
3. Syllabification American accent so in order to make transliteration
closer to Urdu accent some rules are applied on
English-to-Urdu transliteration using CMU syllabified transliterated text.
Pronunciation dictionary which is based on American
accent, generates a lot of inconsistency. To improve 3.2.1. Consonant Cluster. Urdu syllabification does
system’s accuracy; Urdu syllabification is applied on not allow consonant cluster in onset of the syllable
English transcription as shown in table 3.
and in the word medial position. In this case add /ɪ/ if
Consonant and Vowels combine to make syllable
and breaking up a word into syllables is known as the second consonant is ‘r’ or ‘l’ otherwise /ʌ/
syllabification. Sonority sequence principle for between two consonants and mark syllable boundary
syllabification is commonly used in Urdu. It requires after it as shown in table 4.
the onset to rise in sonority towards the nucleus and Table 4: Examples of consonant cluster
codas to fall in sonority from the nucleus [9]. problem
English IPA Urdu IPA
3.1 Algorithm َ
K Lِ
Treehoppe tɹIhɑp ِ
َ tɪ.ɹI.hɑ.pɝ
Template matching technique is used to syllabify r ɝ M
English transcription. In this technique syllabication is َ
done by matching template of the form C0,1.V .Cn [4]. Bless blɛs ,ِ bɪ.lɛs
َ
Urdu allows only one consonant in the onset position Quickly kwɪkli .0"
ِ ِ N kʌ.wɪk.li
and multiple consonants can come in the coda position
of the syllable.
1. Convert the entire phonemic transcription of the
3.2.2. Urduization. If two consonants come in the
word to consonant-vowel pairs
onset of the syllable in the word initial position and
2. Start from the end of the word, traverse backwards
to find the next vowel the starting consonant is ‘s’ then add /ɪ/ before ‘s’ as
3. repeat shown in table 5.
4. if there is a consonant preceding it
5. mark a syllable boundary before consonant Table 5: Examples of Urduization applied on
6. else transliterated Urdu text
7. mark the syllable boundary before this vowel
8. end if English IPA Urdu IPA
ُ
9. until the entire string is consumed School skul ‫ل‬P ‫ِا‬ ɪs.kul
Table 3: Transliteration after applying Skill skɪl QR ِ ‫ِا‬ ɪs.kɪl

syllabification َ َ
IPA Urdu Special spɛʃʌl QS
M ‫ِا‬ ɪs.pɛ.ʃʌl
English
Unsyllabified Syllabified
17
^
4. Out-Of-Vocabulary problem Pr = arg max p( Pr | En ) = arg max p( En | Pr ) p ( Pr ) (1)
Pr Pr
Out-Of-Vocabulary is a very common problem in The trigram language model p( Pri −1 .Pri .Pri+1 ) and
various systems like text-to-speech, machine bigram transliteration model p( En .Pr ) is combined to
translation, cross language information retrieval i i
(CLIR), etc. To resolve this problem, English phoneme maximize the pronunciation probability Pr .
to orthography alignment has to be found out
probabilistically to get one-to-one mapping between 4.2. Computing optimal pronunciation
them as shown in table 6, and then train those aligned sequence
sequence to get most probable pronunciation for an
unknown word. Expectation maximization algorithm is used to
compute optimal alignment sequence. The algorithm
Table 6: English orthography to pronunciation is given below;
alignment Initialization
English Percentages For each English phoneme to orthography pair, assign
Pronunciatio equal weights to all possibilities generated from (1).
p . er . s . eh . n . t . ih . jh . ah . z
n repeat
p(p) . er(er) . c(s) . e(eh) . n(n) . t(t) .
Alignment Expectation-Step
a(ih) . g(jh) . e(ah) . s(z)
For each of the Arpabet phonemes, count up
The entire procedure consists of two steps; instances of its different mappings from the
• English orthography to pronunciation alignment. observations on all combinations produced in (1).
• Computing optimal pronunciation sequence. Normalize the score so that the mapping
After getting pronunciation of unknown text, it will probabilities sum to 1.
be passed through the same procedure like Maximization-Step
syllabification and then Urdu transliteration. The Recalculate the combination scores. Each
architecture of the OOV module is shown in figure 2. combination is scored with the product of the scores
of the symbol mappings it contains. Normalize the
CMU pronunciation dictionary scores so that the mapping probabilities sum to 1.
until convergence
English orthography to phoneme alignment 5. Results

Transliteration process becomes more accurate
after applying syllabification on the pronunciation
Pronunciation parameter estimation and finding probabilistic sequences of Out-Of-
Vocabulary word problem as shown in figure 3.
English-to-Urdu Mappings
Pronunciation parameter optimization To achieve more accuracy
Syllabification
Bigram transliteration model and trigram

To achieve more accuracy
language model probabilities
Figure 2: Architecture of English-to-Urdu Urduization Rules
transliteration
To enhance overall capability
4.1. Orthography to pronunciation alignment
Out-Of-Vocabulary
In this step all the valid combinations of English
orthography En to its pronunciation sequence Pr are Figure 3: Modules of the system that lead it
towards maturity
produced using conditional probability;
18
The System’s accuracy is recorded after maturity University of Computer and Emerging Sciences
of every independent module as mentioned in figure 3. (NUCES), Pakistan.
The lexicon of most frequently used words of English
(15,237 words from British national corpus (BNC)) 8. References
was transliterated into Urdu using the transliteration
system. Accuracy without applying syllabification and [1] W. Gao., K. F. Wong and W. Lam. “Phoneme-based
resolving unknown word problem is described in table Transliteration of Foreign Names for OOV Problem”. In
7 in detail. The results are generated by passing First International Joint Conference on Natural Language
transliterated text to Urdu text-to-speech system and Processing, Pages 374-381, 2004.
analyzing its output.
[2] Saleem, M. “Urdu Rasmulkhat ki Jaamiat”. Akhbar-i-
Urdu, Pages 6-10, Islamabad, Pakistan, 2002.
Table 7: English-to-Urdu mapping accuracy
[3] S. Hussain, “Letter-to-Sound Rules for Urdu Text to
Observations Total Size Speech System”. Proceedings of Workshop on
Correct Mapping (after applying rules) 12,940 Computational Approaches to Arabic Script-based
Incorrect Mapping (due to Syllabification) 173 Language, COLING-2004, Geneva, Switzerland, 2004.
Incorrect Mapping (due to OOV) 2,124
Total 15,237 [4] S. Hussain, “Phonological Processing for Urdu Text to
Accuracy (%) 84.92 Speech System”. Yadava, Y, Bhattarai, G, Lohani, RR,
Prasain, B and Parajuli, K (eds.) Contemporary issues in
Nepalese linguistics. Katmandu, Linguistic Society of
After applying syllabification technique; out of Nepal, 2005.
173 syllabication problems, 91% are resolved (manual
testing). The accuracy of OOV is evaluated [5] J. Kominek, and A. W. Black, “Learning Pronunciation
automatically by using automatic evaluation method Dictionaries: Language Complexity and Word Selection
Bilingual Evaluation Understudy BLEU [10] as shown Strategies”. In Proceedings of the Human Language
in table 8. Technology Conference of the NAACL, Pages 232-239.
New York City, USA, 2006.
Table 8: Overall system accuracy
[6] J. Lewis, , K. McGrath, and J. Reuppel, “Language
Identification and Language Specific Letter-to-Sound
Modules Correct Total Size Accuracy (%) Rules”. Colorado Research in Linguistics, Volume 17,
Mapping 12,940 12,940 100.00 Issue 1, June 2004.
Syllabification 158 173 91.31
OOV 1,518 2,124 72.46 [7] J. Martin, , R. Mihalcea, and T. Pedersen, “Word
Total 14,616 15,237 95.92 Alignment for Languages with Scarce Resources”. In
Proceedings of the ACL Workshop on Building and
6. Conclusion Exploiting Parallel Texts: Data Driven Machine
Translation and Beyond, Ann Arbor, MI, June 2005
Transliteration is a good technique which helps a [8] A. Sen, “Pronunciation Rules for Indian English TTS
system adding multi-lingual ability. It can be used in System”. Workshop on Spoken Language Processing,
various Systems, e.g. text-to-speech, information Mumbai, India, January 2003
retrieval, machine translation, English-to-Urdu parallel
corpus Consistency in Proper Names etc. Overall [9] R. Bokhari, and S. Pervez, “Syllabification and Re-
system’s accuracy is 96% which is quite promising. Syllabification in Urdu”. Akhbar-i-Urdu, Pages 63-67,
The System can be improved by training transliteration Islamabad, Pakistan, 2003.
model on Urdu accent instead of American.
[10] K. Papineni, S. Roukos, , T. Ward, , and W. J. Zhu,
“Bleu: a Method for Automatic Evaluation of Machine
7. Acknowledgements Translation”. Proceedings of the International Conference
on Spoken Language Processing (ICSLP), Pages 901–904,
The work on English to Urdu transliteration 2002.
system has been carried out in a project that involves
[11] CMU. “The CMU Pronunciation Dictionary”,
development of an open-source Urdu screen reader for www.speech.cs.cmu.edu/cgi-bin/cmudict, School of
visually impaired people funded by National Computer Science, Carnegie Mellon University, Pittsburgh,
USA, 2006.
19
Appendix A - English-to-Urdu Mappings
Urdu Urdu
Arpabet IPA Arpabet IPA
Initial Middle Final Initial Middle Final
َ َ
AA ɑ ‫آ‬ ‫◌ا‬ ‫◌ا‬ L L ‫ل‬ ‫ل‬ ‫ل‬
َ َ
AE Æ ‫اى‬ ‫◌ى‬ ‫ے‬ M M ‫م‬ ‫م‬ ‫م‬
َ َ َ
AH ʌ ‫ا‬ ◌ ‫◌ا‬ N N ‫ن‬ ‫ن‬ ‫ن‬
َ َ َ
AO ɔ ‫او‬ ‫◌و‬ ‫◌و‬ NG Ŋ UJ UJ UJ
َ َ
AW Aʊ ‫آؤ‬ ‫◌اؤ‬ ‫◌اؤ‬ OW Oʊ ‫او‬ ‫و‬ ‫و‬
َ َ
AY Aɪ .:ِ ‫آ‬ ◌;‫◌ ِا‬ .: ‫◌ا‬ OY ɔɪ ◌;‫آ‬
ِ ◌;‫وا‬
ِ =‫ا‬
B B ‫ب‬ ‫ب‬ ‫ب‬ P P ‫پ‬ ‫پ‬ ‫پ‬
CH ʧ ‫چ‬ ‫چ‬ ‫چ‬ R ɹ ‫ر‬ ‫ر‬ ‫ر‬
D D ‫ڈ‬ ‫ڈ‬ ‫ڈ‬ S S ‫س‬ ‫س‬ ‫س‬
DH Ð ‫د‬ ‫د‬ ‫د‬ SH ʃ ‫ش‬ ‫ش‬ ‫ش‬

َ َ
EH ɛ ‫اى‬ ‫◌ى‬ ‫ے‬ T T ‫ٹ‬ ‫ٹ‬ ‫ٹ‬
َ َ َ
ER ɝ ‫ار‬ ‫◌ر‬ ‫◌ر‬ TH Θ

ُ ُ ُ
EY Eɪ ‫اى‬ ‫ى‬ ‫ے‬ UH ʊ ‫ا‬ ◌ ◌
ُ ُ ُ
F F ‫ف‬ ‫ف‬ ‫ف‬ UW U ‫او‬ ‫◌و‬ ‫◌و‬
G G ‫گ‬ ‫گ‬ ‫گ‬ V V ‫و‬ ‫و‬ ‫و‬
HH H ‫ح‬ ‫ح‬ ‫ح‬ W W ‫و‬ ‫و‬ ‫و‬
IH ɪ ‫ِا‬ ◌ِ ◌ِ Y J ‫ى‬ ‫ى‬ ‫ى‬
IY I ‫ِاى‬ ‫ِ◌ى‬ ‫ِ◌ى‬ Z Z ‫ز‬ ‫ز‬ ‫ز‬
JH ʤ ‫ج‬ ‫ج‬ ‫ج‬ ZH ʒ ‫ژ‬ ‫ژ‬ ‫ژ‬
K K ‫ك‬ ‫ك‬ ‫ك‬ -
20
Appendix B - English Phonemic Inventory
Vowels
Arpabet | IPA Front Central Back

IY | I UW | U
Closed
IH | ɪ UH | ʊ
Closed-Middle EY | Eɪ OW | Oʊ
Middle OY | ɔɪ
EH | ɛ ER | ɝ AH | ʌ AO | ɔ
Open-Middle
AE | Æ
Open AY | Aɪ AW | Aʊ AA | ɑ
Consonants
Labio- Labio- Al- Post- Pala-

Arpabet | IPA Bilabial Dental Velar Glottal
dental velar velar alveolar tial
P|p T|t K|k
Plosive
B|b D|d G|g
NG |
Nasal M|m N|n
Ŋ
CH | ʧ
Affricative
JH | ʤ
F|f TH | Θ S|s SH | ʃ
Fricative HH | h
V|v DH | Ð Z|z ZH | ʒ
Approximant W|w R|ɹ Y|j
Lateral
L|l
Approximant
21
Appendix C - Urdu Phonemic Inventory
Vowels
IPA Initial Middle Final IPA Letter Description

َ
/ə/ /ə/ ◌ Zabar
/ɑ/ /ɪ/ ◌ِ Zer

ُ
/ɪ/ /ʊ/ ◌ Paish
/i/
/ʊ/
/u/
/e/
/ɛ/ | /ɑɪ/
/o/
/ɑu/
Consonants
Labio- Post- Uvular|

IPA Bilabial Dental Alveolar Retroflex Velar
Dental Alveolar Glottal
Plosive | p b t̪ d̪ ʈ ɖ k g
q ʔ
Stops pʰ bʱ t ̪ʰ d̪ʱ ʈʰ ɖʱ kʰ gʱ
Nasal m mʱ n nʱ Ŋ Ŋʱ
tʃ dʒ
Affricate
tʃʰ dʒʱ
Fricative f v s z ʃ ʒ x ɣ h
Trill r rʱ
Lateral l lʱ
Flap ɽ ɽʱ
Approximant j
22
Appendix D - Transliteration
English Urdu English Urdu English Urdu

َ ََ َ َ َ َ
abandon ‫ن‬45 , ‫ا‬ daniel Q5 ‫ڈ‬ sponsorships M @ ِ 2a M
َ َ َ َ َ َ
b 1ِ , ‫ا‬ .02 D 2 M ‫ا‬ b K, 1 c
abilities ِ expressly ِ strategies ِ
ُ َ َ ََ َ
‫س‬d , ‫ا‬ QS5 5õ ِ
ِ
abuse financial syria
َ َ َ َ َ َ َ
accelerated 41 2 ‫ا‬ flourishing U5@ِ õ
ِ
systematic f1ِ 12 ِ
َ َ َ َ ُ َ َ
acknowledgment ?5K, gِ 5h ‫ا‬ gradually .â i, j thriving UJk‫ ِا‬l
ِ ِ
َ َ َ َ
?5 ‫ا‬ U5g45 ِ ð .m L
ascent handling turkey ِ
َ َ َ ُ َ َ َ
aspiration +SD 2 M D ‫ا‬ interview n‫و‬ cJ‫ِا‬ urban +, ‫ار‬
َ َ َ َ َ َ
oِ L ‫ر‬p‫ا‬ ‫ِاران‬ .015 ð , ‫ار‬
authoritative iran urgently ِ
َ َ َ ََ َ
babies b , ,
ِ justifications bqSh rِ s2ð , veterinary ‫ ِى‬Jc ‫و‬
َ َ َ َ
banned 45 , kennedy ‫ى‬4ِ 5 visually .â ‫ِوژو‬
ِ َ
َ َ َ َ َ
belgium tK , , lithuania 5ِ pugِ vulgar vg‫و‬
َ َ َ َ َ َ
boxing U52ö ِ , luxembourg ‫رگ‬d , 2g wellington +cw5ِ ‫و‬
َ َ َ َ َ َ
brackets 1h , mathematics 1ِ lu williams bg ‫ِو‬
َ َ َ َ َ َ
bradford ‫ڈ‬õ4 , maxwell Q x workshops DM Sö‫ور‬
َ َ َ َ َ َ
bravery ‫رِى‬n , morphological QR ð ,ِ ‫ﻼ‬õ‫ر‬y wrapping U5M ‫ر‬
ِ
َ َ َ َ
bribes b, ;‫ ِا‬, neighbouring UJِ , J yorkshire Sö‫ر‬n
َ َ ُ
chilly .0ِ zM nominating U51ِ 5 J youth
n
ِ
َ َ َ َ َ ُ
chocolate ?"{ M outstanding UJ45 ِ 12|‫آؤ‬ yugoslavia ‫} ﻼ ِو‬n
َ ُ َ َ َ َ َ
computer Ld M ö physiological QR ð ,ِ ‫ ﻼ‬bِ õِ zimbabwe ‫ے‬n, , ‫ِز‬
َ َ َ َ َ َ
.0
ِ K5
, ‫ڈ‬ 1h ð‫ا‬ , M B~, g‫زوا‬
ِ
dangerously projects zoology
23
View publication stats

Annex F - Pador Off-Line Form

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annex F - Pador Off-Line Form

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

English to Urdu transliteration system

Abbas Raza Ali Madiha Ijaz

SEE PROFILE SEE PROFILE

Urdu Speech Interface for Visually Impaired View project

The user has requested enhancement of the downloaded file.

English to Urdu Transliteration System

Abbas Raza Ali and Madiha Ijaz

Abstract English text discards it and as a result generated

After applying syllabification there exists

Table 3: Transliteration after applying Skill skɪl QR ِ ‫ِا‬ ɪs.kɪl

English orthography to phoneme alignment 5. Results

Pronunciation parameter optimization To achieve more accuracy

Bigram transliteration model and trigram

Appendix A - English-to-Urdu Mappings

B B ‫ب‬ ‫ب‬ ‫ب‬ P P ‫پ‬ ‫پ‬ ‫پ‬

CH ʧ ‫چ‬ ‫چ‬ ‫چ‬ R ɹ ‫ر‬ ‫ر‬ ‫ر‬

D D ‫ڈ‬ ‫ڈ‬ ‫ڈ‬ S S ‫س‬ ‫س‬ ‫س‬

DH Ð ‫د‬ ‫د‬ ‫د‬ SH ʃ ‫ش‬ ‫ش‬ ‫ش‬

G G ‫گ‬ ‫گ‬ ‫گ‬ V V ‫و‬ ‫و‬ ‫و‬

HH H ‫ح‬ ‫ح‬ ‫ح‬ W W ‫و‬ ‫و‬ ‫و‬

IH ɪ ‫ِا‬ ◌ِ ◌ِ Y J ‫ى‬ ‫ى‬ ‫ى‬

IY I ‫ِاى‬ ‫ِ◌ى‬ ‫ِ◌ى‬ Z Z ‫ز‬ ‫ز‬ ‫ز‬

JH ʤ ‫ج‬ ‫ج‬ ‫ج‬ ZH ʒ ‫ژ‬ ‫ژ‬ ‫ژ‬

K K ‫ك‬ ‫ك‬ ‫ك‬ -

Appendix B - English Phonemic Inventory

Arpabet | IPA Front Central Back

Labio- Labio- Al- Post- Pala-

Appendix C - Urdu Phonemic Inventory

IPA Initial Middle Final IPA Letter Description

/ɑ/ /ɪ/ ◌ِ Zer

Labio- Post- Uvular|

English Urdu English Urdu English Urdu

View publication stats

You might also like