You are on page 1of 23

VoiceXML Technology

Andrea Piras Guido Zucconi


piras@crs4.it guido@crs4.it 03/09/2001

Contents
VoiceXML Whats VoiceXML? Advantages by web Advantages by SR Advantages by phone Architectural Model VoiceXML enable Voice apps VoiceXML history Now W3C WG

Contents
VoiceXML VoiceXML Techs VoiceXML Techs in Italy VoiceXML Technology Nuance Nuance products SpeechObjects Installation Watcher Processes Access to Watcher

Contents
VoiceXML Technology Watchers Launchpad Some about Vocalizer Standard system An other system A complex system Good or bad? E-mate and VoiceXML? Links

Whats VoiceXML?

VoiceXML is a Web-based markup


language for representing humancomputer dialogs using audio output devices (computer-synthesized and/or recorded) and audio input device (voice and/or keypad tones).

Advatanges by web
Advantages took by web: improve web server capabilities browser more powerful advanced web data representation (XML) web application development tools more powerful internet infrastructure is improving in performance, bandwidth, and quality of service the growth of the World-Wide Web and of its capabilities

Advantages by SR
Advantages took by Speech Recognition: better algorithms and acoustic models require hardware less powerful speech synthesizer nearer to the human talk improvements in computer-based speech recognition and text-to-speech synthesis

Advantages by phone
Advantages tooks by phone: high diffusion portable instant-on using when driving, with earphone :-)

Architectural Model

Architectural Model
Document Server
process request form a client, ex. a web server

Architectural Model

VoiceXML Interpreter
process VoiceXML documents and conduct the dialog

Architectural Model

VoiceXML Interpreter Context


acquire VoiceXML documents, detect and answer calls

Architectural Model
Implementation Platform
controlled by VoiceXML Interpreter Context and VoiceXML Interpreter; generate events in response to user actions and system events; require: audio output (TTS, audio files),audio input (SR, audio record, DTMF)

VoiceXML enable
Voice applications developed easily. Applications are easy to deliver because dont required particular web servers. Work with computers and telephones indifferently.

Voice apps
Information retrieval Electronic transactions Telephone services Call centers Voice e-mail Voice Access Control Voice Recognition
.

VoiceXML history
1995

AT&T Bell Labs PML / PhoneWeb


RAMMING REHOR LADD TUCKEY

1-2/1999

PML

PML VoxML SpeechML

VoiceXML history

3/1999

8/1999 3/2000

VoiceXML 0.9 VoiceXML 1.0

5/2000

PTED CCE A

Now

VoiceXML 2.0

Voice Browser Working Group


Speech Recognition Grammars, Speech Synthesis Markup Language, Natural Language Semantics Markup Language, Multimodal Dialog Markup Language

W3C WG

VoiceXML Techs
WebSphere Voice Server SDK IBM
TTS, ASR, browser

Mya Voice Platforms Motorola


gateway, TTS, ASR, browser, download only Mobile Applications Development Kit

Voice Web Application Platform Telera


voice browser and voice web server, developed TXML before use VoiceXML, after registering its possible checking VoiceXML code on line, no download, California

10

VoiceXML Techs
MagicTalk Voice Gateway - General Magic
integrates VoiceXML, speech recognition, and telephony technologies to enable voice access, no download, California

Bevocal Cafe
after registering its possible checking VoiceXML code on line, no download, California

Enterprise VoiceXML Server - Tellme


after registering in Tellme Studio its possible checking VoiceXML code and grammars on line and listen application by phone, no download, California

VoiceXML Techs
Natural Voices AT&T Labs
high quality TTS, testable on line, no download

Mosquito Minde
voice platform, no download, Utah

VoiceGenie
Server, browser and applications, after registering in Developer Workshop its possible checking VoiceXML code and grammars on line and listen application by phone, no download, Toronto

11

VoiceXML Techs in Italy


VoxNauta Loquendo
voice platform, no download

VoceViva Tiscali
voice platform, good TTS and SR, no download

Nuance
Californian software house with a complete suite of VoiceXML product. After registering, its possible to download almost all products, test voicexml code on line, access to discussion group and read support guides.

12

Nuance products
Nuance 7.0
distributed architecture platform used by the other Nuance products, supports 25 languages

Vocalizer
TTS avaibles in 9 languages

V-Builder
graphical tool for to easily create VoiceXML applications

Verifier
voiceprint identification sotfware

V-Optimizer
tool for analyzing and tuning deployed applications

Nuance products
Voyager
voice browser compatible 80% with VoiceXML 1.0

Voice Web Server


web server contain a browser full compatible with VoiceXML 1.0

Grammar Builder
graphical tool that enables developers to create, view, edit, manage, and test grammars

Nuance Foundation SpeechObjects


Nuance extension of Speech Objects

13

SpeechObjects
Created inside of the V-Commerce Alliance for using natural language in e-commerce, SO are Java packages for voice applications. Define speech channel, grammar handle.

The source code is FREE.

Installation
For installing the platform is require: Nuance 7.0.4 - Service Pack 9 and Speech Object 1.1 Installation: 308 Mbyte Installed: 461 Mbyte Vocalizer 1.0 - Service Pack 1 Installation: 248 Mbyte Installed: 273 Mbyte TOTAL V-Builder 1.2 Installation: 27 Mbyte 813 Mbyte Installed: 53 Mbyte INSTALLED Voice Web Server 1.2 Installation: 12 Mbyte Installed: 26 Mbyte

14

Watcher
Watcher is a deamon/service can start, stop, get and set parameters, quiesce and monitor (using the port 7890) about processes inside the Nuance platform. A Watcher process must run on each machine that must be monitored. A Watcher can communicate with the other ones. The default launched processes are: license manager, resource manager, recognition server, recognition client, compilation server

Processes
Recognition Server (RecServer.exe)
listen for incoming connection request from recognition clients; for each CPU a thread starts; work on port 8200 Compilation Server (compilation-server.exe) compile the grammars; work on port 2527 License Manager (nlm.exe) manage float license across the machines in the network; work on port 8470 Resource Manager (resource-manager.exe): manage the requests of the other processes; do not connect more than 1000 channels; work on port 7777

15

Processes
Recognition Client (RecClient.exe)
points where the applications enter; performs audio playback, recording and controls telephony applications; support the audio providers: native (SB), dialogic (telephony board by Dialogic), nms (telephony hardware), aculab (telephony board by Aculab), h323 (Voice over IP - VoIP) support multiple applications, run applications remotely; can specify the maximun number of threads; 1 recclient each 10 ports and 1 thread each 4 channels; work on port 9200

Access to Watcher

7161

7080

7023

16

Http Watcher

Watchers
Each watcher can communicate with the other one present in the net.

17

Launchpad

Launchpad is a graphical tool able to communicate with all watchers using the same interface.

To start:
>cd %Nuance%/java >java cp launchpad.jar nop.frontend.GUI.GUI

Launchpad

18

Some about Vocalizer


By default, work on port 32323. For more TTS Servers in the same machine is necessary to indicate the port used and give a name.
Ex:
vocalizer tts.resourceName=americanVoice vocalizer -language italian tts.ResourceName=italianVoice tts.Port=32324 vocalizer -language french tts.ResourceName=frenchVoice tts.Port=32325

Good english and french, bad italian.

Standard system

19

An other system

A complex system

20

Good or bad?
High flexibility and scalability Complete FREE Many languages supported Use with telephone boards

JAVA
High number of port used Hardware resources No telephone simulation Many variables Disk space Speech Recognition

E-mate and VoiceXML?


Each E-mate service will be able to become a voice service, and it can be made extending the Object Browser to use VoiceXML. Now the unique free voice platform is Nuance. Is it possible to install a platform supporting all VoiceXML 1.0 during E-mate installation? Not simple but YES. Is it risky? YES, require to install 760 Mbyte of third part software.

21

Links
VoiceXML Forum
http://www.voicexml.org

VoiceXML Central
http://www.voicexmlcentral.com

General Magic
http://www.generalmagic.com

IBM WebSphere Voice Server SDK Version 1.5


http://www-4.ibm.com/software/speech/enterprise/ep_11.html

Mobile Application Development Toolkit


http://www.motorola.com/MIMS/ISG/spin/mix/

Telera
http://www.telera.com

Tellme
http://www.tellme.com, http://studio.tellme.com

Links
VoiceGenie
http://developer.voicegenie.com

V-Commerce Alliance
http://www.v-commerce.org

Natural Voice AT&T Labs


http://www.naturalvoices.att.com

Mosquito
http://www.minde.com

Bevocal Cafe
http://cafe.bevocal.com

Tiscali VoceViva
http://voceviva.tiscali.it

Loquendo
http://www.loquendo.it

22

Links
Nuance
http://www.nuance.com, http://extranet.nuance.com, Nuance Vocalizer 1.0 Nuance Vocalizer Developers Guide, Nuance Voice Web Server Version 1.2 Installation Guide, Nuance Voyager Version 1.0 Voice Browser Installation Guide, Nuance Speech Recognition System Application Developer's Guide, Speech Object & VoiceXML

23

You might also like