You are on page 1of 29

SPELL CHECKER BASIC AND CONTEXT SENSITIVE

By: K Satish Kumar ( 07131A0544 ) B Krishna Chaitanya ( 07131A0547 ) K Viswa Sai Raja ( 07131A0546)

SPELL CHECKER:

A spell checker is an application program that flags words in a document that are not be spelt correctly. Our application provides a method of correction of misspelled and confused words in a phrase written in a natural language. The application can offer several words as choice words for inserting into the passage to replace the unrecognized word.

Basic Spell Checking

The kind of errors which result due to the absence of the typed word in the dictionary are known as non-word errors. These kind of errors can be detected and corrected using basic spell checking capabilities. Examples: ths instead of this, spel instead of spell

BASIC SPELL CORRECTION APPROACH:

In order to perform basic spell checking, first we construct a trie with all the words that are present in a dictionary. A dictionary is nothing but sequence of words in a text file.
After the trie is constructed then the given text which is either a single sentence or a group of sentences is split into words. Then every word is searched for its presence in the trie. If any word is not found it is added to the misspelling list. The suggestions to the words in this list are provided using edit distance criteria and phonetic distance criteria. In order to provide suggestions based on phonetic distance we are using a Class called Double Metaphone from the package commons-codec-1.3 provided by apache software foundation group.

HOW OUR SPELL CHECKER IS DIFFERENT FROM REGULAR SPELL CHECKER???

INPUT

I saw TREI trees in the park

REGULAR SPELL CHECKER

I saw [ TREE | TREK ] trees in the park

INPUT

I saw TREE trees in the park

CONTEXT SENSITIVE SPELL CHECKER

I saw THREE trees in the park

Context Sensitive Spell Checking

Recently, research has focused on developing algorithms which are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words. The detection and correction of spelling mistakes that result in real words of the target language, also known as real word spell checking, is the most challenging task for a spell checking system. However, the majority of those systems are not able to catch the kind of errors such as in Let us meat today (meat was typed when meet was intended). This kind of spell checking is known as Context sensitive spell checking. Indeed, empirical studies have estimated that errors resulting in valid words account from 25% to more than 50% of the errors, depending on the application.

Context Sensitive Spell Check Approach:

In order to perform context based spell checking we are taking the help of a search engine. It can be of Google or Yahoo! or Bing or any other which allows to access the search results of the query through an API. Yahoo! provides the users an api through which we can give unlimited number of queries once we have registered with Yahoo!! BOSS. So finally we are using the search power of Yahoo!.
Yahoo! Search BOSS (Build your Own Search Service) is an initiative in Yahoo!! Search to open up Yahoo!!'s search infrastructure and enable third parties to build revolutionary search products leveraging their own data, content, technology, social graph, or other assets.

Context Sensitive Spell Check Approach:

In this project, we send requests to the Yahoo! Boss Web Service to find the possible real word error in the given sentence. Consider the following sentence, Let us meat today The above sentence will be sent to the Yahoo! web server in the following formats. * us meat today Let * meat today Let us * today

Let us meat *

HOW TO USE THE YAHOO SERVICE:

The Yahoo! web server returns the result count for each sentence sent. Basing on the number of results received from the web server, we estimate the possible real word in the given sentence.

After the error has been detected, we generate suggestions basing on features such as Edit Distance and Phonetic Distance.

But during the testing phase of the spell checking application, we stored the most likely confused words, so that we need not consider the above features and check with the most likely confused pair of the word itself.

Yahoo BOSS Application ID

MAIL FEATURE:

JavaMail is a Java API used to receive and send email via SMTP, POP3 and IMAP. JavaMail is built into the Java EE platform, but also provides an optional package for use in Java SE. The JavaMail API provides a platform-independent and protocolindependent framework to build mail and messaging applications. In our project, we are providing the users with an option to send the Spell Checked text to the users mail account. We use the JavaMail API to send the text content to the mentioned Email Address.

USE CASE DIAGRAM:

CLASS DIAGRAM:

ACTIVITY DIAGRAM:

SEQUENCE DIAGRAMS: 1.USER APPLICATION:

2.USER APPLICATION - WEBSERVICE:

3.USER APPLICATION (MAIL):

SCREEN SHOTS:

OPEN FILE DIALOGUE:

OPENED DOCUMENT:

REPLACING MISSPELLINGS:

WHAT IF NO SUGGESTION FOUND:

ADD TO DICTIONARY:

Context Sensitive Spell Checking

MAIL FEATURE:

REQUEST DETAILS DIALOGUE:

Mail Received

CONCLUSION:

This spell checker can be used when we need a rigorous checking of our text (like when sending the document to higher officials etc.) The Yahoo! BOSS API does not permit large number of requests in short time. For this purpose we used a delay of 9 sec between consecutive requests. This is to be reduced.

Besides this the result of our question greatly depends on the search results from search engine. Sometimes the required pattern may not be found in the search result.
So the future enhancements can be made such as using our own database. A Database size of about 200GB can be made with the help of Google trigram datasets and match the sentence against the trigrams to find out the central word and offer the suggestions based on features such as edit and phonetic distances.

You might also like