Optical character recognition (OCR) is the conversion of images of printed or typed text into machine-encoded text. The document discusses OCR, including its definition, purpose to improve accuracy and speed of document processing, steps involving pre-processing, feature extraction, and classification of characters, and its common uses in digitizing printed texts. The system aims to recognize characters from different languages and fonts accurately and efficiently.
Optical character recognition (OCR) is the conversion of images of printed or typed text into machine-encoded text. The document discusses OCR, including its definition, purpose to improve accuracy and speed of document processing, steps involving pre-processing, feature extraction, and classification of characters, and its common uses in digitizing printed texts. The system aims to recognize characters from different languages and fonts accurately and efficiently.
Optical character recognition (OCR) is the conversion of images of printed or typed text into machine-encoded text. The document discusses OCR, including its definition, purpose to improve accuracy and speed of document processing, steps involving pre-processing, feature extraction, and classification of characters, and its common uses in digitizing printed texts. The system aims to recognize characters from different languages and fonts accurately and efficiently.
Made By: Dhairya Goel- 02814803115 Madhwan Sharma-60214803115 DEFINITION
Optical Character Recognition(OCR) is
the mechanical or electronic conversion of images of typewritten or printed text into machine-encoded text. PROBLEM OVERVIEW Humans are bound to make errors- some time or the other- especially while performing mundane boring tasks like digitization or security, continuously.
Many times we are unable to perceive certain digits due
to various factors- motion, lack of digit clarity, illumination and so on.
It is these problems which have to lead us to delve into
this topic. PURPOSE The main purpose of OCR system based on grid infrastructure is to perform Document Image Analysis, document processing of electronic document formats converted from paper formats more effectively.
This improves the accuracy of recognizing the
characters during document processing.
Here OCR technique derives the meaning of the
characters, their font properties from their bit-mapped images. The primary objective is to speed up the process of character recognition in document processing. As a result the system can process huge number of documents with in less time and hence saves the time.
Since our character recognition is based on a grid
infrastructure, it aims to recognize multiple heterogenous characters that belong to different universal languages with different font properties and alignments. STEPS IN OCR PRE -PROCESSING
Deals with improving quality of the image
for better recognition of the system.
Consists of : Noise Removal, Deblurring,
Binarization and Edge detection. FEATURE EXTRACTION Transforming the input data into the set of features is called Feature Extraction.
Feature extraction is performed on raw data prior to
applying k-NN algorithm on the transformed data in feature space.
This extracts properties that can identify a character
uniquely, and differentiate between similar characters. Example CLASSIFICATION USES It is widely used as a form of Data Entry from printed paper data records, whether passport documents, invoices, bank statements, business card, mail or other documents.
It is common method of digitizing printed texts so that
it can be electronically edited, searched, stored more compactly, displayed on line and used in machine processes such as machine translation, text to speech, key data and text mining. CONCLUSION OCR technology provides fast, automated data capture which can save considerable time and labour costs of organizations.
The system has its advantages such as Automation of
mundane tasks, less time complexity, very small database and high adaptability to untrained inputs with only a small number of features to calculate.