You are on page 1of 26

Speech Recognition

Submitted To : R. K. Nagar Submitted by : Priyanka Tomer E.I.E (6th Sem) 0822932027

Certificate
NAME ROLL NO CLASS SEM : Priyanka Tomer : 0822932027 : E.I.E. : 6th

INSTITUTE : Vidya College Of Engineering

This is certified to be the bonafide work of the student on the topic SPEECH RECOGNITION for the seminar during the academic year 2010-2011.

TEACHER INCHARGE R.K. Nagar

Acknowledgement
I am very thankful to everyone who all supported me, for I have completed my work effectively and moreover on time. I am equally grateful to my teacher MR. R. K. Nagar. He gave me moral support and guided me in different matters reading the topic. He had been very kind and patient while suggesting me the outlines of my topic and correcting my doubts. I thank him for his overall support. Last but not the least, I would like to thank my parents who helped me a lot in gathering different information, collecting data and guiding me from time to time to time related to this topic despite of their busy schedules, they gave me different ideas in making my work unique. Thank you Priyanka Tomer E.I.E. (6th Sem ) 0822932027

Contents
Introduction History Voice Recognition Software Enrolment Detection and Correction Classification of Speech Recognition System How To Create a Voice Recognition System Framework For Authentication Application Advantage and Disadvantage Conclusion

HISTORY
While AT&T Bell Laboratories developed a primitive device that could recognize speech in the 1940s, researchers knew that the widespread use of speech recognition would depend on the ability to accurately and consistently perceive subtle and complex verbal input. Thus, in the 1960s, researchers turned their focus towards a series of smaller goals that would aid in developing the larger speech recognition system. As a first step, developers created a device that would use discrete speech, verbal stimuli punctuated by small pauses. However, in the 1970s, continuous speech recognition, which does not require the user to pause between words, began. This technology became functional during the 1980s and is still being developed and refined today. Speech Recognition Systems have become so advanced and mainstream that business and health care professionals are turning to speech recognition solutions for everything from providing telephone support to writing medical reports. Technological advances have made speech recognition software and devices more functional and user friendly, with most contemporary products performing tasks with over 90 percent accuracy. According to figures provided by industry. Satisfying the needs of consumers and businesses by simplifying customer interaction, increasing efficiency, and reducing operating costs, speech recognition is used in a wide range of applications. Furthermore, Allied Business Intelligence (ABI), the increased popularity of speech recognition will push revenues from $677 million in 2002 to an estimated $5.3 Billion by 2008. Indeed, recent advances in speech recognition software are creating a dynamic environment, since this technology appeals to anyone who needs or wants a hands-free approach to computing tasks. As the merger of large vocabularies and continuous recognition continues, look for more and more companies to move toward speech recognition and watch the industry take its place as a leader in the technology sector.

Introduction

Speech recognition is a technique used in Speech Processing in which the presence or absence of human speech is detected.

What Is Voice Recognition?


Voice recognition is an alternative to typing on a keyboard. Put simply, you talk to the computer and your words appear on the screen. The software has been developed to provide a fast method of writing onto a computer and can help people with a variety of disabilities. It is useful for people with physical disabilities who often find typing difficult, painful or impossible. Voice recognition software can also help those with spelling difficulties, including users with dyslexic, because recognised words are always correctly spelled.

These are the image of different type voice sensor.

Voice Recognition Software


Voice recognition software programs work by analysing sounds and converting them to text. They also use knowledge of how English is usually spoken to decide what the speaker most probably said. Once correctly set up, the systems should recognise around 95% of what is said if you speak clearly. Several programs are available that provide voice recognition. These systems work best on Windows XP Windows Vista. A number of voice recognition programs can be used with Windows, including the one supplied with Microsoft Vista. Most specialist voice applications include a software CD, a microphone headset, a manual and a quick reference card. You connect the microphone to the computer, either into the soundcard (sockets on the back of a computer) or via a USB connection. Then you can begin talking using the following steps.

Enrolment
Everybody sounds slightly different, so the first step in using a voice recognition system involves reading an article displayed on the screen. This process, called enrolment, takes less than 10 minutes and results in a set of files being created which tell the software how you speak. Many of the newer Voice recognition programs say this is not required however we would always say its worth doing to get the best results. The enrolment only has to be done once, after which the software can be started as needed.

DETECTION AND CORRECTION


When talking, people often hesitate, mumble or slur their words. One of the key skills in using voice recognition software is learning how to talk clearly so that the computer can recognise what you are saying. This means planning what to say and then delivering speech in complete phrases or sentences The voice recognition software will misunderstand some of the words spoken and it is necessary to proof-read and then corrects mistakes. Corrections can be made by using the mouse and keyboard or by using your voice. When corrections are made the voice recognition software will adapt and learn, so that (hopefully) the same mistake will not occur again. Accuracy should improve with careful dictation and correction.

Isolated voice recognition system: This type of input requires the user to
pause between words so that the computer may distinguish the beginning and ending of words. Although your speech has to be modified slightly, hence slowing your regular dictation, you can achieve well over 80 WPM, the speed of an advanced typist. Some have even reported speeds of up to 125 WPM.

Continuous voice recognition system: This system does not require brief
pause between the spoken words. This technology is currently available for very small vocabularies (2000 words) and numbers by very few vendors. This speech input requires the user to say only words that are known to the system. You are also limited to the expandability of the libraries. This technology is currently not useful for dictation, but is very useful for specific functions or programs, i.e. data entry systems.

Speaker dependent recognition system: This system can recognize speech


only from one speaker. If the speaker is change then it cant recognize the voice.This technology requires users to participate in extensive training exercises that can last several hours. Once you are done "drilling the machine" the computer then begins several calculations from the data it has received from your exercises. After these calculations, the computer makes a voice profile that attempts to match your voice synthesizations

Speaker independent recognition system: This system can recognize any


ones speech. This technology, on the other hand, does not require a person to go through exercises. A user may begin using the Voice Recognition program upon installation

How to create a voice recognition system


Speech acquisition : Speech acquisition is the process of collection of the
speech. For the training purpose the speech is acquired using the microphone, for the analysis. For this analog speech input convert in to digital. The sound card of PC use to convert the analog speech input to the digital format for further analysis.

Speech analysis: This is the second step to create a voice recognition system.
The first important step in speech analysis is to separate each word from the ambient noise. If noise is not separated properly it gives error. Further each spoken words is compared with the inbuilt acoustic model or dictionary which is created during the training session. This dictionary created by the programmer. The above step is done with the help of an efficient speech detection

User interface development: Final Step is to Develop a User Interface, so that


all users can use these system with ease.

For example: Speech recognition system of window 7 looks so compact and it is as shown diagram.

Framework for Authentication/Interaction


S1

Speaker Recognition

Speech Recognition

parsing and arbitration

S2

SK

SN

This is the block diagram of a speech recognition system. First block is speaker recognition in which system recognize that who is speaking. Then speech recognition block. Last block paring and arbitration.

Framework for Authentication/Interaction


Who is speaking?

S1

Speaker Recognition

Speech Recognition

parsing and arbitration

S2

SK

Annie David Cathy

SN

Authentication

When speaker speaks then first block recognize who is speaking, like in this block diagram Annie David Cathy is the speaker.

Framework for Authentication/Interaction


What is he talking about?

S1

Speaker Recognition

Speech Recognition

parsing and arbitration

S2

SK

Switch,to,channel,nine
Channel->TV Dim->Lamp On->TV,Lamp

SN

Inferring and execution

Last block do the parsing and arbitration with the given instruction to the present instruction. Then according to given instruction task is performed.

Applications
Health care
In the health care domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. The services provided may be redistributed rather than replaced. Speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document. It never goes through an MT/editor. Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. Deferred SR is being widely used in the industry currently. Many Electronic Medical Records (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speechrecognition engine. Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard. Healthcare solutions are usually very state-specific, however some companies adjust their solutions to the needs of concreete markets (p.e. Speech Technology Center in Russia has a Finnish partner Vitim OY with a "Terve Elama" project).

Hands-free computing: voice command recognition computer user interface;

Military High-performance fighter aircraft


Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note is the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), and a program in France installing speech recognition systems on Mirage aircraft, and also programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft, with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Working with Swedish pilots flying in the JAS-39 Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads. It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly. Contrary to what might be expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for the recognizer, as could be expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. The Eurofighter Typhoon currently in service with the UK RAF employs a speakerdependent system, i.e. it requires each pilot to create a template. The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot workload, and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands.Speaker independent systems are also being developed and are in testing for the F35 Lightning II (JSF) and the Alenia Aermacchi M-346 Master lead-in fighter trainer. These systems have produced word accuracies in excess of 98%

F-35 is the first U.S. fighter aircraft with voice recognition system able to hear pilot spoken commands to manage various aircraft sub system. Such as communication and navigation.

Helicopters
The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the jet fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone. Substantial test and evaluation programs have been carried out in the past decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK. Work in France has included speech recognition in the Puma helicopter. There has also been much useful work in Canada. Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system. As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.

Battle Management
Battle Management command centres generally require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format. Human-machine interaction by voice has the potential to be very useful in these environments. A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. In one feasibility study speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. Users were very optimistic about the potential of the system, although capabilities were limited. Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface. Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task. Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system.

Training air traffic controller


Training for air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in avoice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation. Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel. In theory, Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task should be possible. In practice this is rarely the case. The FAA document 7110.65 details the phrases that should be used by air traffic controllers. While this document gives less than 150 examples of such phrases, the number of phrases supported by one of the simulation vendors speech recognition systems is in excess of 500,000. The USAF, USMC, US Army,US Navy and FAA as well as a number of international ATC training organizations such as the Royal Australian Air Force and Civil Aviation Authorities in Italy, Brazil, Canada are currently using ATC simulators with speech recognition from a number of different vendors.

Telephony and other domains


ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use. The improvement of mobile processor speeds made feasible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands. Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), Nuance Communications (Nuance Voice Control), Speech Technology Center, Vito Technology (VITO Voice2Go), Speereo Software (Speereo Voice Translator and SVOX.

Further applications

Automatic translation; Automotive speech recognition (e.g., Ford Sync); Telematics (e.g. vehicle Navigation Systems); Court reporting (Realtime Voice Writing); Hands-free computing: voice command recognition computer user interface; Home automation; Interactive voice response; Mobile telephony, including mobile email; Multimodal interaction; Pronunciation evaluation in computer-aided language learning applications; Robotics; Video games, with Tom Clancy's EndWar and Lifeline as working examples; Transcription (digital speech-to-text); Speech-to-text (transcription of speech into mobile text messages);

ADVANTAGES & DISADVANTAGES


Advantages of using voice recognition technology:

Provide better accuracy than keyboard input Audio feedback improves data application accuracy Serve the needs for hands-free, eyes-free and real-time input very well High reliability and flexibility Time-saving data input Eliminate spelling mistakes

Disadvantages of using voice recognition system:


For some applications, the system tends to have high false reject rate due to the background noise and other variables Low signal to- noise ratio. Overlapping speech. Differentiation between homonyms. Intensive use of computer power.

Conclusion
Human performance figures suggests that we still have enormous rooms for improvements. For get good efficiency and to remove flaws and weakness of VRS use high quality microphone, good quality of sound cards. System must be trained properly and if possible work in quiet environment. At present several new algorithms are developed to implement voice recognition system.

References
www.abilitynet.org.uk www.tech.purdue.eud en.wikipedia.org

You might also like