You are on page 1of 5

* Submitted to: Miss

Ayesha

* Submitted By: Mudasar


Ellahi

* Class: BSCS (A) -1


 What is OCR? (Definition): -

Often abbreviated OCR, Optical Character Recognition refers to the


branch of computer science that involves reading text from paper and
translating the images into a form that the computer can manipulate
(for example, into ASCII codes). An OCR system enables you to take a
book or a magazine article, feed it directly into an electronic computer
file, and then edit the file using a word processor.
Or
OCR (Optical Character Recognition) is the recognition of printed or
written text characters by a computer. This involves photo scanning of
the text character-by-character, analysis of the scanned-in image, and
then translation of the character image into character codes, such as
ASCII, commonly used in data processing.

 How it Works: -
• All OCR systems include an
optical scanner for reading
text, and sophisticated
software for analyzing
images. Most OCR systems
use a combination of
hardware (specialized circuit
boards) and software to
recognize characters,
although some inexpensive
systems do it entirely through
software. Advanced OCR
systems can read text in
large variety of fonts, but
they still have difficulty with
handwritten text.

• In OCR processing, the scanned-in image or bitmap is analyzed


for light and dark areas in order to identify each alphabetic letter
or numeric digit. When a character is recognized, it is converted
into an ASCII code.
• Older OCR systems match these images against stored bitmaps
based on specific fonts. The hit-or-miss results of such pattern-
recognition systems helped establish OCR's reputation for
inaccuracy.

• Today's OCR engines add the multiple algorithms of neural


network technology to analyze the stroke edge, the line of
discontinuity between the text characters, and the background.
Allowing for irregularities of printed ink on paper, each algorithm
averages the light and dark along the side of a stroke, matches it
to known characters and makes a best guess as to which
character it is. The OCR software then averages or polls the
results from all the algorithms to obtain a single reading.

 List of OCR Software: -


There are many softwares available over the Internet for OCR. Here are
some of them.

 ExperVision
 ABBYY
 AnyDoc Software
 OmniPage
 Readiris
 Example: -
• Suppose you wanted to digitize the novel Moby Dick
overnight. You could stay up all night typing and still not
finish. Or you could use a high-end scanner and in minutes
scan all of author Herman Melville's works into a computer
using optical character recognition (OCR) technology.

 Usage of OCR: -
• The potential of OCR systems
is enormous because they
enable users to harness the
power of computers to access
printed documents. OCR is
being used by libraries to
digitize and preserve their
holdings. OCR is also used to
process checks and credit
card slips and sort the mail.
Billions of magazines and
letters are sorted every day
by OCR machines,
considerably speeding up
mail delivery.
• For many document-input tasks, OCR is the most cost-effective
and speedy method available. And each year, the technology
frees acres of storage space once given over to file cabinets and
boxes full of paper documents.

 Ideal Source Material for OCR: -


• OCR works best with originals
or very clear copies and
mono-spaced fonts like
Courier. If you have choices,
use the following source
material:
• 12 point or greater font size.
• Black text on a white background.
• A clean copy; not a fuzzy multi-generation copy from a copy
machine.
• Standard type font (Times, New Roman, etc.) Fancy fonts may
not be recognized.
• Single column layout.
 OCR Limitations:-
• Using text from a source with font size less than 12 points or from a
fuzzy copy will result in more errors.
• Except for tab stops and paragraphs marks, MOST document
formatting is lost during text scanning, (Bold, Italic & Underline are
sometimes recognized).
• The output from a finished text scan will be a single column editable
text file. This text file will always require spellchecking and
proofreading as well as reformatting to desired final layout.
• Scanning plain text files or printouts from a spreadsheet usually
works, but the text must be imported into a spreadsheet and
reformatted to match the original.
 What Source Material Doesn't Work Well for OCR? : -
• Forms (especially with boxes and check boxes)
• Very small text
• Multi-generation fuzzy or blurry copies from a copy machine
• Mathematical formulas
• Draft copies of documents with hand-written revisions
• Fancy text and unusual fonts
• Handwritten text

 Source:

 webopedia.com
 techtarget.com
 computerworld.com
 wikipedia.com
 about.com

You might also like