You are on page 1of 89

College of Engineering

Department of Computer Engineering

Senior Design II Report (COE 491)

Read2Me: A Reading Aid for the Visually Impaired

Anza Shaikh - 42554

Heba Saleous 42416

Ragini Gupta - 49089

Date: October 28, 2015

Supervised By: Dr. Assim Sagahyroon

1|Page
Abstract

The inability to read has a huge impact on the lives of the visually impaired. Nowadays
printed text appears everywhere like product labels, restaurant menus, instructions on bottles,
signed boards etc. Thus blind people need some assistance to read this text. In this context, our
project focuses on the development of a reader and an android application that can translate an
image of text into an audible speech for the user. Therefore, through this project we propose a
solution of a portable camera device that can be used to take images of any printed material
which will be processed and converted into speech using Optical Character Recognition (OCR)
on cloud and Text to Speech (TTS) software offline.

2|Page
Acknowledgement

We would like to express our heartfelt gratitude to Dr. Assim Sagahyroon, our senior
design advisor, for his constant motivation and guidance throughout the project. We would also
like to express our sincere gratitude to Mr. Ricardo from the Mechanical Engineering
Department for making the stand for our project and Mr. Wissam Abou Khreibe for guiding us
on the development of the web API and ordering hardware for our project. We would also like to
extend our thanks to our parents and friends for their support during this work and any guidance
they have given us.

3|Page
Table of Contents

I. Introduction ........................................................................................................................................... 9
II. Read2Me on RPi................................................................................................................................... 11
A. Previous Work: ............................................................................................................................... 11
B. Statement of Problem: ................................................................................................................... 18
C. Functional Requirements: .............................................................................................................. 18
D. Non - Functional Requirements: .................................................................................................... 19
E. Component Requirements: ............................................................................................................ 20
F. Design Objectives ........................................................................................................................... 22
G. Design Constraints.......................................................................................................................... 23
H. Technical Approach: ....................................................................................................................... 24
I. Preliminary Design: ........................................................................................................................ 35
J. Preliminary Cost Estimate: ............................................................................................................. 40
III. Read2Me An Android Application ...................................................................................................... 41
A. Previous Work: ............................................................................................................................... 41
B. Statement of Problem: ................................................................................................................... 45
C. Functional Requirements: .............................................................................................................. 45
D. Non - Functional Requirements: .................................................................................................... 46
E. Design Objectives: .......................................................................................................................... 47
F. Design Constraints: ........................................................................................................................ 48
G. Technical Approach: ....................................................................................................................... 51
H. Preliminary Design: ........................................................................................................................ 56
I. Preliminary Cost Estimates: ........................................................................................................... 62
IV. Testing ............................................................................................................................................. 63
V. Comparison Between two Approaches ................................................................................................ 71
VI. Project Management ....................................................................................................................... 72
A. Preliminary Schedule: ..................................................................................................................... 72
B. Gantt Chart: .................................................................................................................................... 74
VII. Standards ......................................................................................................................................... 76
VIII. Social Impact.................................................................................................................................... 77

4|Page
IX. Conclusion ........................................................................................................................................... 78
X. Future Prospects.79

XI. Appendix A80

XII. Appendix B84

XIII. Glossary.86

XIV. Bibliography.87

5|Page
List of Figures
Figure 1: Image undergoes four steps in product information reader framework [2] ............................... 11

Figure 2: Block Diagram of Vision Based Assistive System [3] .................................................................... 12

Figure 3: The overall system with each hardware component [4] ............................................................. 13

Figure 4: Finger Reader [5].......................................................................................................................... 14

Figure 5: Software portraying the detected text while finger scanning, and the extracted text in camera

analysis [5] .................................................................................................................................................. 14

Figure 6: Functional diagram for an automated pen explaining the different process from capturing a

image to reception by a ZigBee headset [6] ............................................................................................... 15

Figure 7: Simulation of a newspaper page for extracting the textual contents [6] .................................... 16

Figure 8: Tyflos Reader Prototype [7] ......................................................................................................... 16

Figure 9: Aligned text within the four borders of an A4 size paper 22

Figure 10: Read2Me Prototype ................................................................................................................... 24

Figure 11: Battery Pack with AA batteries .................................................................................................. 25

Figure 12: Overall configuration of the device with the camera, Raspberry Pi, remote control, and

earphones ................................................................................................................................................... 36

Figure 13:The stand with the Camera Module placed on top .................................................................... 36

Figure 14: Workflow of Read2Me on RPi .................................................................................................... 38

Figure 15: The wearable camera used during testing and the main view of the application

Figure 16: The System Diagram of the Open Source OCR Framework [9].................................................. 41

Figure 17: Comparison of the median value of the string distance in images for three frameworks

Tesseract, Abbyy and Leadtools [10] .......................................................................................................... 43

Figure 18: System architecture for Camera Reading for Blind [10] ........................................................... 43

6|Page
Figure 19: Stand for the Android Phone ..................................................................................................... 49

Figure 20: (a) The welcome screen of the Read2Me Android Application. (b) Main screen of the

application (c) Settings Activity (d) Processing (e)Result of OCR (f) Text is being read out loud ............... 57

Figure21: System Architecture of Read2Me Application 59

Figure22: Read2Me Application Workflow.60

Figure23: (a) Picture for German OCR (b) Picture for English OCR62

Figure25: OCR output for Android Application (English and German).65

Figure 26: The amount of CPU power being used by the system before running Read2Me is 1%...........67

Figure 27: The amount of CPU Power being used while the text is being converted into audio is 26%...68

Figure 28: Power and RAM Consumption For Read2Me Application69

Figure 29: Analysis chart between Read2Me Android Application and RPi .....71

Figure 30: Gantt Chart74

Figure 31: Gantt Chart (Continued)75

7|Page
List of Tables
Table 1: Example Positioning Commands [7].............................................................................................. 17

Table 2: Example Reader Commands [7] .................................................................................................... 17

Table 3: Comparison of Arduino and Raspberry Pi [12] ............................................................................. 27

Table 4: Raspberry Pi models comparison [13] .......................................................................................... 28

Table 5: Text type recognized by ABBYY OCR [15]..................................................................................... 31

Table 6: Hardware Components ................................................................................................................. 81

Table 7: The overall estimated cost .............................................................. Error! Bookmark not defined.

Table 8: Hardware Components (Android App) ......................................................................................... 56

Table 9: System Costing ( App) 62

Table 10: Design Costing ( App) .................................................................................................................. 62

Table 11: The overall schedule of the project ............................................................................................ 73

Table 12: Image formats supported by Abbyy[14] 81

Table 13: Usage Cases between Cloud OCR SDK and Mobile OCR engine[18] 82

Table 14: Attributes between Cloud OCR SDK and Mobile OCR engine.. 82

Table 15: Development and Deployment between Cloud OCR SDK and Mobile OCR engine.. 82

Table 16: Business Model between Cloud OCR SDK and Mobile OCR engine. 83

Table 17: Specifications of Raspberry Pi 2 Model B .................................................................................... 84

Table 18: Specifications of Raspberry Pi Camera Module .......................................................................... 85

8|Page
I. Introduction

With the recent advent in technology, the technological world is obliged to find creative
solutions that would assist the visually impaired in leading a better life. Approximately 285
million people around the globe suffer from these disabilities, with 39 million being completely
blind. According to the World Health Organization (WHO) [1], 1.4 million blind individuals are
minors under the age of 15, and 90% of people with impairments live in low and middle income
countries.

However, despite of the entrenched research efforts in this field, the world of print
information such as newspapers, books, signs, menus etc. remains mostly out of reach to the
majority of visually impaired individuals.

In an effort to seek for an answer to this persistent problem, an assistive platform, referred to
in this work as Read2Me is being developed in this project to be used by these individuals.

As most of the visually impaired people struggle in their work environment ending up in low
income jobs, therefore one of the goals of this project is to design a cost-effective technology.
The main goal of this project is to devise a reading aid with the following features:
Small-size
Lightweight
Efficient in using computational resources
Low cost

In this project two approaches are explored in order to build Read2Me; these approaches can
be summarized as follows:

Approach 1: First design consists of a Raspberry Pi microcomputer with its compatible


camera module mounted on any wearable or a standalone device such as glasses, stand etc., that
the user already owns or has to purchase. The RPi will run Optical Character Recognition (OCR)
on the image captured by the camera followed by Text-to-Speech (TTS) synthesis. A picture will
be taken using the Raspberry Pis camera module and is then sent to the cloud where the OCR

9|Page
takes place. The text resulting from this process is then downloaded back onto the Raspberry Pi
and processed into audio on the device itself before being read aloud through speakers or a
headset.

Approach 2:

Mobile phones are one of the most commonly used electronic gadget today. They are no
longer just a wireless voice communication tool but have emerged as a powerful computing
platform that is capable of performing intense compute processing in real time. Smartphones
with powerful microprocessors, high resolution cameras, various embedded sensors like
accelerometer and GPS are on a rise in todays world. This has led to the emergence of exciting
social applications on mobile platform, like Business card readers, Document Scanners, file
converter, smart tour guides, translators, etc.

In this paper, we will explore the development of a framework on a smartphone platform by


designing an OCR based application coupled with Speech Synthesis for audible results. The
phones integrated camera is used to take the photo and then send the image to an OCR Cloud
being used for the Raspberry Pi device. The text result is then downloaded and converted into
audio using the phones built in TTS software, and read aloud.

These approaches will be discussed in further detail throughout the upcoming sections of
this report. After the in-depth discussions of each approach, they will be compared to each other
in order to decide which may be more efficient in certain areas than the other.

10 | P a g e
II. Read2Me on RPi

A. Previous Work:
The journal article in [2] introduces a camera-based product information reader
framework to help blind persons to read information of the products. This reader consists of a
low cost webcam which acts as the main vision in detecting the label image of the product. The
image is then converted into gray scale through thresholding and noise removal and is then
processed by separating the label from the image using MATLAB and Canny Edge Detection
Algorithm for image segmentation. Following that, Maximally Stable Ertremal Region (MSER)
is used for automatic text extraction followed by region filtering as shown in Figure 1. This
reader makes use of the Template matching algorithm for OCR after the MSER output has
undergone the text extraction. The text in output text file from OCR is matched with the saved
product names in the database and the matched product is identified. For each product name in
the database, a corresponding audio file is saved containing the complete information of
specified product which is played upon finding a match. One advantage of this reading
technology is that it can handle tilted texts. However, for this reader to work accurately, the
alphabets of the text must have high contrast compared to the background otherwise MSER will
not work, thereby rendering incorrect results. Moreover, this framework is limited to reading of
the product labels and only those labels that have been saved in the database. Furthermore, the
testing of this framework hasnt revealed any promising results on the speed of image processing
and audio reception.

Figure 1: Image undergoes four steps in product information reader framework [2]

The journal article in [3] presents a camera based assistive text reading framework to help
blind people read text labels and product packaging from hand-held object in their daily resides.

11 | P a g e
The system framework consists of three functional components: scene capture, data processing,
and audio output. A mini laptop running Ubuntu 12.04 as the operating system has been used as
the processing device in this prototype system. The hardware for this framework consists of the
ARM 11 processer as shown in Figure 2. In this framework, a webcam has been used as the
camera to capture images/videos. In case of the video, the frames are segregated and undergo
pre-processing. What makes this framework interesting is that to extract the hand-held object of
interest from other objects in the camera view, the user is supposed to shake the handheld objects
containing the text they wish to identify. Then a motion-based method is used to localize objects
from cluttered background. An off-the-shelf OCR software is used for text recognition and for
TTS, the Microsoft Speech Software Development Kit is used to load the script files where the
recognized text codes are recorded. However, this framework requires shaking of the objects for
extraction followed by motion detection techniques which means the processor has to undergo
heavy processing. Moreover, the article does not mention the time it takes to receive the audio
output. Also, this framework is limited to labels and would not be able handle large amount of
text.

Figure 2: Block Diagram of Vision Based Assistive System [3]

12 | P a g e
The article mentioned in [4] speaks of a Raspberry Pi device being used to read small
text, as well as detect humans and vehicles up to a certain distance ahead of the user as shown in
Figure 3. The camera attached to this device is a standard USB webcam. In [4], OpenCV, OCR
software developed by Intel that is capable of doing real-time image processing, is used. It also
provides a Graphical User Interface which is designed using Qt, application framework in C++
used for such applications, as well as some server command line tools and commands. However,
graphical user interface will be almost impossible for the blind to interact with. Moreover, using
OpenCV means occupying a large amount of memory on the SD card as well as utilizing more
processing power which will reduce the battery life of Rpi and will also slow down other
processes on the RPi.

Figure 3: The overall system with each hardware component [4]

The journal article in [5] elaborates on an innovative device named, Finger Reader (as
shown in Figure 4) that serves as a wearable device like a ring to support text reading for the
visually impaired. The device was designed for the blind in response to several difficulties
encountered while reading text with the existing technology such as alignment issues, mobility,
accuracy, positioning, and efficiency. The Finger Reader introduces an innovative concept of
local sequential text scanning to read big blocks of text with each line progressively. It can also
be used for skimming to the major parts of a text in parallel with the provision of auditory
feedback to the user. The hardware implementation of this device comprises of two vibration
motors fixed on top and bottom of the ring that gives a haptic feedback through signal patterns
like pulsing to the user in order to guide in which direction the user should move the camera. In

13 | P a g e
accordance with the hardware design, a software stack is also implemented (as depicted in Figure
5) on a PC application that comprises of text extraction algorithm, hardware control driver,
Tesseract OCR and flite Text-to-Speech software. The text extraction algorithm includes
complex image binarization and selective contour extraction methods that aid in refining the line
equations sequentially before sending them to the OCR engine. From the OCR engine the user is
able to listen to each word that falls under his/her finger, and at the end of every line read, the
system triggers an auditory feedback. However, one major drawback of this device is that as the
user moves progressively from one line to another the audio feedback sent to the user is
segmented instead of a continuous feedback, which confuses the user in their positioning of the
device on each line.

Figure 4: Finger Reader [5]

Figure 5: Software portraying the detected text while finger scanning, and the extracted text in camera analysis [5]

14 | P a g e
The engineering report in [6] discusses the implementation of an automated electronic
pen to aid the visually impaired in reading and understanding textual contents as shown in Figure
6. The pen consists of a pinhole camera which captures the image of the text highlighted by the
pen, and then this image is input to an Intelligent Word Recognition System to convert the image
into the text as shown in Figure 7. The text is then read out aloud using the text-to-speech
converter. This work uses the Zigbee technology for the transmission and reception of the audio
signals. However, this paper did not consider the amount of training a blind person would require
to place the pen on the words to be read. This could be a major problem since its obvious that a
blind person would not be able to accurately place the pen on the words, thereby rendering
inaccurate results. Moreover, the testing of this technology hasnt revealed any promising results
on the speed of audio reception.

Figure 6: Functional diagram for an automated pen explaining the different process from capturing a image to reception by a
ZigBee headset [6]

15 | P a g e
Figure 7: Simulation of a newspaper page for extracting the textual contents [6]

Figure 8: Tyflos Reader Prototype [7]

The journal article in [7] discusses about a wearable document reader known as Tyflos
Reader (shown in Figure 8) for the visually impaired. The device is glasses which has two stereo
vision cameras mounted on top of it on either side. The device has a microphone which reads out
the text extracted from the image captured by the two stereo vision cameras. This device also
uses similar image processing technology as described above however the interesting aspect of
this reader is that it not only integrates speech feedback (Commands shown in Table 1) but it
also takes speech commands (i.e. a voice user interface) from the user and acts appropriately.
Some of the user commands are shown in Table 2. Moreover, this device also uses page

16 | P a g e
segmentation through Zeta Scanning Algorithm which segments the document image into textual
blocks depending on the font size. This was done specifically for newspapers so that the
headlines could be separated from the supporting text. The primary processing device is a
PDA/laptop which implies that the user needs to purchase one before using the device.
Moreover, the Voice User Interface might not function perfectly in a noisy environment,
rendering it limited to indoor use.

Table 1: Example Positioning Commands [7]

Table 2: Example Reader Commands [7]

17 | P a g e
B. Statement of Problem:
Despite the availability of extensive studies on the theme proposed in this paper, we
observe that there exist shortcomings in the approaches discussed in the literature review with
respect to real life scenarios such as image capture, and efficiency in text recognition when the
conditions are not ideal. Also, most of the systems developed are built using expensive hardware
components that stand beyond the reach of many visually impaired people. After reviewing the
published literature, we can infer that there is no technology small enough and cost effective for
the visually challenged to carry at all times and fast enough to match the pace of reading by a
person with normal sight. We intend to introduce a product that will assist the visually
challenged in reading short and specific text such as in restaurant menus, labels on medicine
bottles, as well as literary books. In an effort to prototype a device comprised of inexpensive
components, we present a cost effective product that is within the realms of the vast population
of visually impaired.

C. Functional Requirements:
The proposed system should have the following functional requirements:

FR1. The system must allow the user to take pictures of the intended reading material.

FR2. The system must take photos of the text to be converted into audio.

FR2.1. The photo is stored on the Raspberry Pi and is then transmitted to the OCR Cloud.
The photo must get overwritten when a new photo is taken to save up memory.

FR3. The system must be capable of sending the photos to the OCR Cloud.

FR3.1. The photos will be pre-processed and then converted into text.

FR3.2. The resulting text file will be downloaded back onto the Raspberry Pi.

FR4. The system must convert text into audio on Raspberry Pi.

FR4.1. The text file resulting from R3 will be processed into audible speech.

FR5. The system must playback the audio into either speakers or a headset.

18 | P a g e
FR6. The system must allow the user to play, pause and replay the audio currently being played
to them.

FR7. The system must include a clicker with five buttons for the user to take photos and control
audio.

FR7.1. The buttons should have unique uses, such as capturing the image, pause, play,
and replay capabilities, and alternating the language between English and French.

FR8. The Clicker must have the buttons labeled with carved letters for the user to differentiate
between the buttons

FR9. The system must have audio output in case of any errors occur such as OCR failed or
credits not enough for OCR.

D. Non - Functional Requirements:


The non-functional requirements include:

NFR1. The photos taken by the camera should be clear.

NFR2. The OCR software should clean up the picture taken for accurate conversion.

NFR3. The text file resulting from the OCR software should be formatted neatly and
consistently.

NFR4. The audio file resulting from the TTS software should be clear and played at a
reasonable speed.

NFR5. The overall system should be portable for the user to take wherever they want.

NFR6. The system should have enough battery power to last for at least 5 hours.

NFR7. The physically carried components should not become too hot while being handled.

NFR8. The system should convert the image to audio as fast as possible.

NFR9. The buttons on the clicker should be clearly distinguishable to the user.

19 | P a g e
E. Component Requirements:
Most visually impaired people are distinguishable from the thick, black glasses they
usually wear to cover their eyes. From that observation, we decided to use these glasses to be the
surrogate for the project. Like people who wear normal glasses to see, the blind will be able to
use these glasses to read. In order for this to be possible, a Raspberry Pi microcontroller unit
will be designed and implemented for this purpose. It will be integrated with a camera module
and interfaced with a web API. The weight of this whole unit will not exceed more than 2 lbs.

For this prototype, three separate devices will be used: the glasses to be worn, a small
case to be accompanied and a clicker to be carried while the glasses are in use. The components
have been listed below:

i. Glasses components:

The glasses will contain the following parts:

Camera
Speakers/Headset

The main functionality of the camera is to capture pictures of whatever the lens is aimed
at. These pictures are then stored in the microcontrollers memory.

The speakers or headset provide the main output of this product, which is the audio
playback of the text detected from the images sent to the OCR cloud. The audio will be
downloaded onto the Raspberry Pi once the image has completed its processing cycle. The audio
played back to the user should be clearly audible in the intended language.

ii. Other Portable components:

The following components will be included in the case:

Raspberry Pi 2 Model B microcomputer


Battery Pack
MicroSD Card (8GB)
Clicker

20 | P a g e
The 8 GB SD card that is inserted into the Raspberry Pi will act as the memory of the
system. The operating system, images and audio results will be stored on this card.

iii. Software Components:

The main software modules are:

Abbyy Cloud OCR


TTS Software
Python Web API

The Abbyy Cloud OCR software is high end OCR software that uses image processing
technology to detect and recognize characters in digital text documents with a variety of
qualities, including low-light, low-quality documents. It uses preprocessing to detect text
orientation, correct an images resolution, and remove texture from the image. This software will
be utilized in our system due to these features and its ability to do all of its processing on a cloud
system rather than the microcontroller itself.

A TTS program will be installed on the Raspberry Pi to receive the text product of the
OCR software. An audio product will be created out of the processed text, containing the text
intended to be read to the wearer of the glasses.

A web API in Python language will be executed in order to allow communication


between the Raspberry Pi and the Abbyy OCR software. The API will allow automated
communication to the cloud, detecting the image file that has recently been added to the
microcontrollers memory and sending it to Abbyy for preprocessing and conversion.

21 | P a g e
F. Design Objectives
An overall view of the system and various components are shown in Figure 9. The design
objectives are summarized as follows:
The system should have a clicker that will be attached to the RPi through a wired connection.
The clicker consists of 5 buttons namely; Exit, Capture, Play/Pause, Replay and Language as
shown in Figure 9.
Since Read2Me system is intended for indoor as well as outdoor, so to power the Pi board in
a wireless environment, the best choice is to use a battery power pack.
The system will be connected to a battery pack that will regulate powering ON/OFF of the
RPi.
The camera attached to the glasses is portable so it can be mounted on any wearable device
such as glasses or a tripod stand designed particularly for this project.
The system will allow the visually impaired to take images of any material in print they
desire to read at the click of the capture button on the clicker.
The clicker attached to the glasses serves as a remote control to perform a specific task.
The capture button of the clicker will start the camera with a self-timer of 5 seconds to allow
the user to stabilize the camera or the documents position before image capturing.
The camera is integrated with automatic capturing after stalling for 5 seconds.
The device will be able to scan and read a multiple number of English printed material
ranging from books, bills, documents, etc.
The image taken by the wearable glasses will be saved on the SD card so that they can be
sent to the cloud. Only the last image captured by the camera will be stored.
A python script interfacing an API to communicate with the Cloud server will execute on
Raspberry Pi at the press of Capture button. This will send the image to ABBY cloud
distribution system for the optical recognition of the text of the image.
A text-to-speech tool will be installed on RPi that can synthesize human voice from the
provided string of the text file.
Only the most recent audio file (record.mp3) will be stored on SD card of RPi after speech
synthesis.
Speakers will be attached to the system to play the audio feedback in the desired language
with a female voice.

22 | P a g e
The Exit button on the clicker will shut down the RPi.
By default, the language set for TTS is English.
The Language button on the clicker will allow the user to alternate between English and
French.
If the quality of the image captured by the camera is not adequate for recognition, an audio
error message will play prompting the user to capture the image again by pressing the
Capture button of the clicker.
An audio error message will be played if the subscription of the OCR service has expired as
it will hinder the image processing due to insufficient credits.
An internet connection is important to communicate with the cloud OCR server for image
processing.

G. Design Constraints

The system has the following limitations:


One of the major constraints in our system will be the correct positioning and alignment of
the camera while taking a photo of the printed text from wearable glasses. If the user captures
the image with incomplete text (like cropped words from edges), it will lead to inaccurate
results. This can be avoided if the text on a paper such as an A4size paper is well spaced and
aligned from the four corners. We recommend a minimal margin of 2.38 from the top edge
of the page and 1.0 from left, right and bottom edges(horizontally or vertically). If the text
lies within the four screen borders of the camera, it is likely to get cropped.

Figure 9: Aligned text within the four borders of an A4 size paper

23 | P a g e
The minimum distance between the camera and the text document for an accurate
recognition should be 0. 5 meters in all cases irrespective of the text content or the font size
of the document. The maximum distance should be 1 meters. If the camera is too close or too
far to the text document, it will capture blurred images causing inefficiency.
For the images, only the most recent image will be stored on the SD card to sustain the
maximum capacity of the SD card.
As for the audio clips, only the last audio clip gets stored on the SD card at any one time.
Since the camera module used does not support night vision, the user has to be in a well-lit
environment to capture images.
The user can only take an image after the processing and audio feedback for the last image
has been completed in order to avoid queueing.
Real time latency in the image transmission from RPi to the OCR cloud.

H. Technical Approach:

The literature review showed that most of the image processing took place on a PDA or a
small laptop which required a certain amount of memory for processing. We decided to use an
OCR cloud system known as Abbyy Cloud OCR SDK for the OCR processing since it was
designed to handle large text documents of varying qualities and types. We intend to send the
images captured by the camera module to the OCR cloud using a Wi-Fi adapter that will connect
the RPi to the OCR cloud system. The Abbyy OCR cloud will be responsible for processing and
converting the image into text and sending the processed output i.e. a.docx file back to the RPi.
The .txt file is then stored locally on the RPi memory and is then synthesized into speech by
using SVOX Pico TTS on the RPi. Internet connection is required for this. The various
components of the system are shown in Figure 11.

24 | P a g e
Figure 10: Read2Me Prototype

There are several alternatives for the components that could be used for the proposed system.
The team step by step went through the analysis of the possible alternatives and selection of the
best one.

i. Selection of the Power Supply:

Since the device is intended to be portable and ready-to-use anywhere, the team decided to
power the Raspberry Pi with a battery pack. There are two options; battery pack with four AA
batteries and a USB battery pack.

Battery Pack with four AA batteries:

An AA battery delivers a current of 2.3A for one hour, and the RPi consumes 0.7A then
in theory a set of AA cells of appropriate total voltage could power the Pi for 2.3/0.7 or just over
3 hours.

25 | P a g e
USB Battery Pack:

This contains a 4400mAh lithium ion battery, a charging circuit, and a boost converter
that provides 5VDC up to 1 Amp via a USB port. According to [11], a 'headless' pi (no
keyboard, mouse, display attached) with a mini Wi-Fi adapter plugged into the USB port lasted 7
hours.

With the above description, it is obvious that one would go for the USB battery pack
however it automatically shuts off if the device isn't drawing a lot of power, since it's meant for
charging cell phones. Also, when you start and stop charging the pack, it will flicker the output,
this can cause a 'power sensitive' device like the RPi to reset on the power supply. Therefore, we
decided to use the AA batteries to power the RPi. They might not be long lasting but they are
cheaper and more reliable than the USB battery pack. Moreover, this battery pack has an on/off
button as well which means the RPi can be isolated from power supply by just switching off a
button. The battery pack with AA batteries is shown in Figure 12.

Figure 11: Battery Pack with AA batteries

ii. Selection of the Microcontroller:

There are many microcontrollers that support media transfer but we narrowed our choices
down to Arduino and Raspberry Pi because of their popularity and support for REST API.
However, the team has to pick one microcontroller that will establish the connectivity between
itself and the cloud, as well as some of the motion sensors so that the device could later on be
scaled to add more functionality. Table 3 summarizes the advantages and disadvantages of the
Arduino and Raspberry Pi.
26 | P a g e
Name Arduino Uno Raspberry Pi 2
Model Tested R3 Model B
Price $29.95 $35
Size 2.95"X2.10" 3.37"x2.125"
Processor ATMega 328 ARMv7
Clock Speed 16MHz 900MHz
RAM 2KB 1GB
Flash 32KB (micro SD card)
EEPROM 1KB
Input Voltage 7-12 V 5V
Min. Power 42 mA (.3W) 700 mA (3.5W)
Digital GPIO 14 8
Analog Input 6-10 bit N/A
PWM 6
TWI/I2C 2 1
SPI 1 1
UART 1 1
Dev IDE Arduino Tool IDLE, Scratch, Squeak/Linux
Ethernet N/A 10/100
USB Master N/A 4 USB 2.0
Video Output N/A HDMI ,Composite
Audio Output N/A HDMI, Analog

Table 3: Comparison of Arduino and Raspberry Pi [12]

As it can be inferred from Table 3, Raspberry pi 2 is 56 times faster than Arduino and has
500,000 times more RAM, and since our project involves sending multimedia, size of the RAM
and fast processing is one of the main goals we have to achieve.

According to [12], the Raspberry Pi is best suited for projects that require a graphic
interface or the internet and because of its various inputs and outputs; it also tends to be the
preferred board for multimedia projects. And hence we chose Raspberry pi as the primary
controlling device for our project. Furthermore, Arduino does not support audio output which is
a primary requirement for our project.

As per [13], out of the four Raspberry Pi models, we chose to work on RPi 2 model B,
since it has a provision of additional USB ports in comparison to Model A and B+. In addition to
this, the audio circuitry on RPi 2 is improved with an integrated feature of low noise power

27 | P a g e
supply. Furthermore, RPi 2 has a quad core processor which promises faster processing than the
other three models mentioned and has Windows support as well. The table shown below depicts
a comprehensive distinction between the four RPi models:

Table 4: Raspberry Pi models comparison [13]

In conclusion, we decided to select RPi 2 Model B as our platform of choice because of


the advantages summarized above.

28 | P a g e
iii. Selection of the OCR Software:
Two options were considered for OCR; Tesseract and Abbyy Cloud OCR SDK.
Tesseract:
Tesseract is probably the most accurate open source OCR engine available. Combined
with the Leptonica Image Processing Library it can read a wide variety of image formats and
convert them to text in over 60 languages. It works on Linux, Windows, and Mac OS. Tesseract
converts the .jpg images to plain txt files. One of the major advantages of Tesseract is that its
free to use and can be used offline i.e. no internet connection required. However, the issue with
Tesseract is that it needs an image taken from a camera with at least 5.7 MP (the camera used for
this project is 5 MP only) for recognition and it takes about 3 to 10 minutes to process an image
with about 300 words (a book page would have approximately 300 words) and is unable to
recognize handwriting. Moreover, according to the journal article in [10] on Camera Reading for
Blind People, Tesseract had lower accuracy compared to Abbyy. The accuracy could be
improved though using software like OpenCV for preprocessing (such as isolation of the
background, fixing fonts etc.) but that means more processing power and delay in the output.
These disadvantages led to the elimination of Tesseract as our choice for OCR framework.

Abbyy Cloud OCR SDK:

Since OCR and conversion processing is CPU intensive, it would require a great amount
of power consumption. To keep the power consumption of the RPi to minimum and to increase
the speed of the OCR, we propose an alternative method i.e. the processing be done on the cloud
using the ABBYY Cloud OCR SDK. Its service is platform-independent due to the fact it is
accessible through Web API and is not running on the device itself. So a Web API can be
developed running under any OS: Android, iPhone, Windows Phone 7, BlackBerry, Windows,
Linux, Mac OS, iOS, etc. and develop cross-platform application. The only thing required is an
internet connection. However, the software is commercial and will therefore charge for its use
after the free trial expires. Since ABBYY incorporates pre-processing and post processing stages

29 | P a g e
for the OCR-ed text therefore it stands out as the most optimum platform for character
recognition of text. It eliminates the need of involving any overhead cost of improving image
quality before extracting text from the image.

This software is not only limited to the recognition of the paper text but as well as
barcode recognition, hand-printed text recognition, business card recognition and supports up to
198 recognition languages including French, English and Spanish.

Summarized below are some of the characteristics of the ABBYY OCR:

1. Web API for Abbyy OCR Cloud:


A client server architecture approach is followed for communication between the
Raspberry Pi and the ABBYY cloud system as shown in Figure 23. A web user API will be
developed and hosted on the Raspberry pi microcontroller to communicate with the cloud system
via REST services. REST web services will be preferred over SOAP because it is less complex
in invoking requests from the ABBYY cloud system using the URL approach instead of XML.

2. Image formats supported by Abbyy:

There are 12 image formats supported by ABBYY. The popular ones include JPEG,
PNG, PDF, BMP. For the rest of the image formats, refer to Appendix A.

3. Text types recognized by Abbyy:

Below, some of the text types could be seen that can be recognized by ABBYY OCR.

Printed text Description Example


type
Normal A common typographic type of text, such as Normal Font
Arial, Times New Roman or Courier.
Gothic Text printed with the Gothic type and used for
Gothic recognition.

30 | P a g e
Typewriter text typed on a typewriter.

Matrix text printed on a dot-matrix printer

Index Special set of characters including only digits


written in ZIP-code style.
OCR_A A monospaced font designed specifically for
OCR. It is largely used by banks, credit card
companies and similar businesses.
OCR_B A font designed specifically for OCR.

MICR_E13 Special numeric characters printed in


B magnetic ink. MICR (Magnetic Ink Character
Recognition) characters are found in a variety
of places, including personal checks.
MICR_CM Special MICR barcode font (CMC-7). It is
C7 used on the bank checks.

Table 5: Text type recognized by ABBYY OCR [15]

iv. Selection of Internet Connectivity:


The team thought of three possible ways we can get internet connectivity on the RPi for
OCR:

Wi-Fi Dongle:

The team considered the Edimax EW-7811 Wi-Fi adapter which supports 150 Mbps
802.11n Wireless data rate but the speed of the internet connection will depend on the wireless
network the adaptor will be connected to. However, the wireless adaptor will only be suitable for
indoor use or places where the wireless network is available. Moreover, connecting to the
password protected networks will require entering the password for which might be difficult for
the visually impaired to do. Even if someone else helps the person connect the device to the
wireless network, the person will need a display as well as a laptop to do it.

Internet Key/Dongle:

We considered the Internet Key/Dongle provided by Du or Etisalat, the telecom networks


in the United Arab Emirates. After thorough research, the team found out that the cost of the
data plan provided by Du is cheaper than that by Etisalat (4G LTE). Also, to use the USB
modem by Etisalat, the device needs to be locked to Etisalat only. The package that suited our

31 | P a g e
needs was the Internet Key that supported 21.6 Mbps with a price of 149 AED and an included
memory capacity of 4GB. For a higher data rate, the user can purchase a costlier Internet key.

LAN Connection:

Since RPi has an Ethernet port, this was considered as a possibility as well. This option
works for only those who have LAN connections in their homes. This method of connecting to
the internet is the easiest because no additional packages need to be installed and no
configuration needs to be done on RPi. The speed of the internet connection depends on the LAN
connection being used which is mostly faster than the wireless networks.

The team decided to go with the Internet key because this project requires internet
connectivity on the move. The user, in this case, will not have to go through the hassle of
connecting to any wireless network before using the device and will also not be restricted to
indoor use. This option might be costly but this is the best option for outdoor use.

v. Selection of TTS Software:


The team considered the following options for TTS because of their popularity:

Festival:

Festival is a free TTS software which converts text files into .wav files. It works fine
however it produces a voice like a rough sounding robot, which is not what we are aiming for out
project. Also, after saying a word or a sentence, Festival needs a pause of approximately 5
seconds before accepting more text.

Flite:

Flite (festival-lite) is a small, fast run-time synthesis engine developed using Festival for
small embedded machines. This software also works offline and produces a better quality voice
than Festival. Unlike Festival its start-up time is very short (less than 25ms on a PIII 500MHz)
making it practical (on larger machines) to call it each time you need to synthesize something.

32 | P a g e
eSpeak:

eSpeak is a more modern speech synthesis package than Festival. eSpeak is a compact
open source software speech synthesizer for English and other languages, for Linux and
Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be
provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural
or smooth as larger synthesizers which are based on human speech recordings. eSpeak works
offline and has male as well as female voices. It supports up to 51 languages. The only drawback
being that the voice is too robotic and difficult to understand.

SVOX Pico TTS:

Pico TTS is a Google Android TTS engine and works offline. This engine produces quite
good voices that sound natural. This engine supports up to five languages only but the quality of
this engine outweighs other offline engines.

Google Translate TTS

For using Google TTS, the text is sent to Googles servers to generate the speech mp3 file
which is then returned to the RPi and played using a media player. This means an internet
connection is required for it to work, but the speech quality is better than any of the offline TTS
software available. The service is, unfortunately, limited to 100 characters which means one
would have to break the text down and make separate requests to Google. A major drawback of
this TTS service is that sometimes the server goes down and throws a Service unavailable
exception, rendering the service useless.

Acapela

Acapela TTS is TTS software for Linux Embedded systems. The voice quality is excellent and it
supports a large variety of language as well as voices. The only drawback of Acapela is that its

33 | P a g e
not free. Acapela provides its API to communicate with them which means this service also
requires internet connection to work.

Based on the above reasoning, the team decided to use Pico TTS for the project. The
team wanted to use an offline TTS service to save the maximum amount of data on the Internet
Key.

vi. Selection of Software used for pause/play/replay functionality:


Two possible software were considered for this because of their popularity:
Pygame:

Pygame is a set of Python modules designed for writing games. Pygame adds
functionality on top of the excellent Single DirectMedia Layer (SDL) library. It allows to create
fully featured games and multimedia programs in the python language. The Pygame Mixer
Music Module can be used to load and play music. One of the advantages of using Pygame is
that its pre-installed on the RPi. Moreover, Pygame has abundant methods for loading sound
objects and controlling playback.

Pyglet:

Pyglet is a cross-platform windowing and multimedia library for Python. When graphics
are not needed, Pyglets media player is more elegant, and better maintained compared to that of
Pygame. However, to resume playback in Pyglet, there is a next method which does not
guarantee gapless playback which means some of the audio might go unheard. Also, there is not
stop method for audio in Pyglet.

Both of them are interface packages with built-in a/v support however because of the
disadvantages of Pyglet mentioned above and the small community size of Pyglet and thin
documentation, the team decided to use Pygame.

vii. Selection of the Programming Language:


The RPi can be programmed in many languages such as Java, C++ etc. but the team
decided to program it using Python because of the readily available documentation of media

34 | P a g e
related projects that have mostly been coded in Python and also because Python is way faster
than any other programming language on RPi.

I. Preliminary Design:

i. Hardware:
The system will make use of the hardware components listed in Table 3 and described in
Section II.

Item Available in Quantity


COE store
Raspberry Pi 2 Model B N 1
Raspberry Pi camera module N 1
Black Glasses N 1
Gomadic Portable AA Battery Pack N 1
Raspberry Pi enclosure N 1
8GB SD card for Raspberry Pi preinstalled with N 1
Raspbian Wheezy

Edimax 150Mbps wireless Nano USB adaptor N 1

Push buttons Y 5

Breadboard Y 1

Table 6: Hardware Components

ii. Software:
Following software packages were used for the development of the system:
Raspbian Wheezy software

Raspbian Wheezy is a Debian based free operating system that is used to make the Raspberry
Pi hardware run. More than just a pure OS, it comes with multiple software packages, and
various pre-compiled software combined in an efficient format for easy installation on the
Raspberry pi. We will be burning this OS on our 8GB SD card so we can start using the
Raspberry Pi.

35 | P a g e
ABBYY OCR SDK Cloud
SVOX Pico TTS software
Integrated Development Environment - IDLE with Python 3.x support with the 'tkinter'
modules installed (normally installed as part of Raspbian);
Pure Python Pi-Camera API
The software packages have been described earlier in Section II (subsection I: Technical
Approach)

iii. Network:
The Communication protocols used:
Serial communication between Raspberry Pi and the Camera Module (The camera plugs
into the CSI socket on the Pi, using IC for control.)
Wireless IEEE 802.11n (The Wi-Fi dongle connects to the wireless network for OCR)

iv. System Design:

An overall view of the system is shown in Figure 9. The camera will be attached to the
glasses, as seen in Figure 9, allowing the user to simulate similar head angles used to read text.
The camera is attached on the bridge of the glasses in order to capture most of the text to be read.
The earphones are available for the user to listen to the audio result. Alternatively, speakers can
be used instead for others to listen as well. A simple design for the remote control being used can
be seen in Figure 10. The X button will be used to exit, or turn off, the system. The 1 button
commands the camera module to capture an image. The 2 button replays the most recent audio
file. The 3 button pauses or plays the audio currently being played. The 4 button allows the user
to alternate between the systems main (English) and secondary language (French). The push
buttons on the clicker are debounced externally by connecting 10nF capacitors in parallel to the
button and ground.

36 | P a g e
Figure 12: Overall configuration of the device with the camera, Raspberry Pi, remote control, and earphones.

In order to increase the functionality of the Raspberry Pi device, a stand has been
designed and built, as shown in Figure 13, for it to allow the user to sit comfortably without the
glasses and only use the remote control to read text. The camera will be placed on the stand, held
in a position that would allow it to capture as much text with as little blur as possible from the
physical material placed below.

Figure 13: The Stand with the camera module placed on top

37 | P a g e
The sequential workflow of the system is shown in Figure x. The logical sequence of the
operations is as follows; the user switches the system on using the battery pack, if the user
presses the Capture Button of the clicker, the camera gets activated and the user captures the
image of the text he intends to read, the images are then stored on the RPi memory and are then
sent to the ABBYY cloud for OCR, the text file is received back by the RPi and TTS is applied
on the received text file, the audio output is played into the ear piece.

38 | P a g e
Figure14: Workflow of Read2Me on RPi

39 | P a g e
J. Preliminary Cost Estimate:

Item Available in Quantity Cost (Dhs)


COE store
Raspberry Pi 2 Model B Y 1 209
Raspberry Pi camera module N 1 175
Black Glasses N 1 93
Gomadic Portable AA Y 1 116
Battery Pack
Raspberry Pi Model B+ N 1 39
enclosure
8 GB SD card for Raspberry Y 1 80
Pi preinstalled with
Raspbian Wheezy
Tactile Push Y 5 50
switches(Clicker)
Edimax 150Mbps wireless N 1 57
nano USB adaptor
3.5 MM audio jack N 1 30
headphone
Data for connection to OCR N ?
OCR Subscription N NA ?
TTS N NA 0
Cost =>
849 + ?
Dhs.
Table 7: The overall estimated cost

40 | P a g e
III. Read2Me An Android Application

A. Previous Work:
In an effort to help blind individuals know their way around their current environment, an
application called Listen2dRoom [8] was developed. The application works by scanning the
room using a wearable camera (as shown in Figure 18) or the smartphones integrated camera for
QR codes placed on objects around the room. The scan occurs from the left side of the room to
the right, and an audio output lists objects in three different ways using AT&Ts TTS Demo. The
first method of listing items is in sequential order from the first item scanned, to the final one.
The second method groups objects together and lists them according to their location in the room
relative to the rooms walls. The final way is to list the items spatially; items on the left side of
the room would be heard from the left, and items on the right side would be heard from the right
using a headset [8]. This application was tested with blind individuals to gather accurate opinions
on their feelings towards the application and their uses. It received positive input, as well as
opinions about what could be different in terms of customization and wearability.

Although this visual is different than the project we are presenting, it is an example of
how smartphones can be used as a visual aid for those suffering from visual impairments. It also
gives us an insight on what blind people would like in a phone application in order for us to
adjust our Read2Me application to further suit their needs.

Figure 15: The wearable camera used during testing and the main view of the application [8]

The paper [9] Open Source OCR Framework using Mobile devices discusses about a
project based on the development of a complete Open source OCR framework with subsystems
on a mobile platform. For this purpose, the open source OCR engine Tesseract is used for speech

41 | P a g e
recognition and Flite synthesis module for integrating Text-To- Speech ability. An image is
captured using a Microsoft Windows mobile phone camera and processed to be read out aloud
from the built in speaker of the mobile. For the image processing, the existing open source
desktop technology is used to develop a complete OCR framework. The image captured is first
converted into a bitmap format which is transformed into a single channel intensity image. The
intensity bitmap is used by the OCR engine. The Tesseract OCR engine will translate this
converted image into a text file with ASCII coded characters. This ASCII text file is post
processed to remove all the non-alphanumeric characters before feeding into the speech
synthesizer. The total time for the entire system to capture image and synthesize text into speech
was accounted from around 8 to 12 seconds. Figure 19 illustrates the overall architecture of the
system.

Figure16: The System Diagram of the Open Source OCR Framework [9]

From Figure 10, we can observe that the application core components comprised of a
simple GUI for the user interaction, a DAI i.e. Digital Audio Interfere for output of synthesized
speech, an adapter to transform the input image into output data readable by the OCR engine ,
an OCR engine and a TTS synthesizer. The open source OCR engine and TTS tool were ported
on the mobile platform using Visual Studio IDE and Windows Mobile 5.0 SDK. However, the

42 | P a g e
system developed cannot result in efficient real time results for the visually impaired people who
will have to operate the mobile camera. There will be image noise and distortions using the
mobile platform for capturing images. Moreover, images taken from a mobile camera will have
to be compressed in to a BitMap format readable by the OCR engine, Tesseract. This overhead
of image conversion can be eliminated with the use of a digital camera device which will support
image capture in BMP. Also, in order to run complex algorithm for image processing of OCR
engine a lot of computational power and memory space of the mobile phone is required. Since
Tesseract lacks cross-referencing with the dictionary, ability to post process the recognized text,
and other high level semantic techniques, therefore the accuracy of the recognized text is also
small.

The journal article in [10] on Camera Reading for Blind People explains about the focus
on the development of a mobile application that allows the blind user to read a printable text. It
integrates the use of OCR and TTS tools so that the picture taken from the smartphone can be
converted into an audio feedback. For choosing an efficient OCR framework, a preliminary test
was performed by taking pictures of text with different layout, size, light and shape. Then a text
file was transcribed from each of the images to make a comparison between the text received
after the optical character recognition and the original text image. A Levenshtien Distance
Algorithm function was created for the measurement of similarity between the two strings of
optically recognized text and original text. Levenshtien distance was calculated based on the
number of operations needed to make the two strings equal. For instance, if the original string p
=kitten and the OCRed string t=sitten, the value of levenshtien distance d(p,t) calculated
will be 1 since only one alphabet has to be substituted to make the two strings equal. Hence, the
lesser the value of d, the better the framework performance. This test was conducted with 30
images being recognized by three different OCR frameworks and the differences produced
between the original and OCR-ed Text was verified. The results of the Levenshtien distance
computed for the three different OCR frameworks are shown in the following Figure 20.

43 | P a g e
Figure 17: Comparison of the median value of the string distance in images for three frameworks Tesseract, Abbyy and
Leadtools [10]

It was observed from the above results that although the commercial frameworks like
Abbyy and Leadtools had better results, the research project was based on the free software
Tesseract due to project budget limitations. Furthermore, the TTS tool implemented on the
application was AVSpeech Synthesizer which was supported by iOS7 for human voice synthesis.
For a better system optimization and efficiency in the use of their application, two additional
stages were also included; preprocessing and post-processing. In the preprocessing stage
different image filters like CIColour control and CIColorMonochrome were added to improve
the image quality before feeding them to the OCR engine, whereas in the post processing stage, a
function was created to calculate an error rate percentage through a mathematical formula. If the
error rate exceeded a defined value, the user was prompted to repeat the process of capturing the
text image. The System architecture of their work is shown in Figure 21.

Figure18: System architecture for Camera Reading for Blind [10]

Since the application is aimed for a blind person, one limitation of this product is that it
will be difficult for a blind person to orient image capture with a proper positioning of the mobile
camera. The user will require some external assistance as the image capture is not implemented
44 | P a g e
as an automatic system. Also, the user will have to purchase an iPhone mobile to be able to
access the application.

B. Statement of Problem:
In context to the above discussed research and development, a number of technical
approaches have been embraced to develop an image-based application. However, most of the
work has been done with the implementation of OCR engine on the mobile platform locally. This
has led to certain limitations owing to the limited hardware, power and memory resources of the
phone. It requires a large amount of pixels that need to be processed on limited on-board
memory in comparison to desktop systems with faster CPU processing speed and additional
virtual memory processes. The text image undergoes different layers of processing before the
final result is sent for the speech synthesis, and this exhausts a lot of mobile phones
computational capacity. Moreover, these applications lack a user-friendly interface that can
guide the visually impaired people to navigate through the application. Therefore, in order to
address these problems, we intend to develop a modular application which will be using cloud
services for OCR and the built in Android TTS for producing an audible result of the text file.
Furthermore, interactive voice labels will be integrated in the application so that the user can
easily navigate through the application.

C. Functional Requirements:
The application should have the following functional requirements:

FR1. The system must take pictures automatically after a timer of up to 3 seconds, of the
intended reading material using the phones default camera

FR2. The system must take photos of the text to be converted into audio.

FR2.1. The photos will be stored in the Android phones default image directory until
they are sent to the Abbyy OCR Cloud.

FR3. The system must send the photos to the Abbyy OCR Cloud.

45 | P a g e
FR3.1. The photos will be pre-processed and then converted into text.

FR3.2. The resulting text file will be downloaded back onto the Android phone.

FR4. The system must convert text into audio on the mobile phone itself.

FR4.1. The text file resulting from R3 will be processed into audible speech.

FR5. The system must playback the audio into either the integrated speakers or a headset.

FR6. The system must allow the user to play, pause, rewind, and fast forward the audio currently
being played to them.

FR7. The system must include a user interface with buttons large enough for the user to hover
over them individually.

FR7.1. The buttons must read aloud what they do once they are tapped once.

FR7.2. The buttons must do their intended action once they are double-tapped.

D. Non - Functional Requirements:


The non-functional requirements for the application include:

NFR1. The photos taken by the Android phones camera should be clear.

NFR2. The OCR software should clean up the picture taken for accurate conversion.

NFR3. The text file resulting from the OCR software should be formatted neatly and
consistently.

NFR4. The audio file resulting from the TTS software should be clear and played at an
understandable speed.

NFR7. The Android phone should not use too many resources while running the application.

NFR8. The system should convert the image to audio as fast as possible.

46 | P a g e
E. Design Objectives:
The design objectives can be summarized as follows:

Since the target audience of this application is Visually impaired therefore, the
application will have supporting voice commands on every button to help the user
navigate while performing a specific task, e.g. capturing images or selecting the desired
language.
A first time user will have to set the ABBYY application ID and password. Unless the ID
and password are provided, the Capturing button will remain disabled.
The default language to read out text is set to English.
The user can change the language to Spanish by a double tap on the Language button. It
can be changed back to English by a subsequent double tap again.
The device will be able to scan and read a multiple number of English and Spanish
printed material ranging from books, bills, documents, etc.
After selecting the desired language, the user can start capturing images with a double tap
on the Capture Button of the home screen.
If the user wants, he/she can also select an existing image for reading from the image
gallery by selecting the From File option located on the task bar.
The image taken by the phone camera will be saved in the default image gallery or the
external SD card of the phone (if present).
THE ABBYY cloud OCR will transform the images into a readable text format.
The ABBYY OCR provides dictionary support for both Spanish as well as English
Language ensuring a higher accuracy in scanning and processing the image text
The default Android TTS engine will be used to synthesize human speech from the
processed text file.
The seekbar of the audio speech will allow the user to view the progress in playing
record.
The user will be able to Play/Pause, Rewind and Forward the audio by selecting the
Play/Pause button or dragging the seekbar respectively.

47 | P a g e
A Wi-Fi connection or a 4G should be configured on the mobile phone to communicate
with the cloud server for image processing.

F. Design Constraints:
Taking image from an embedded camera of the mobile device can cause multiple
distortions in an image that can make even the best OCR engine to fail. This section discusses
some of the tradeoffs that come across while using this application and their possible approaches
that can be applied to redress those constraints.

Lighting Condition:
Issue: Unbalanced lightening in the background due to several factors such as
shadows, reflection etc. can deteriorate the quality of the captured image. Furthermore,
enabling the camera flash can cause glare leading to more complications in processing
the image.
Proposed solution: In a less lighted environment it is advisable to set a greater
aperture value and disable flash. The user should use additional light sources positioned
in way that no shadows fall on the text document.

Perception Distortion(Tilt):
Issue: The tilt happens mostly when the plane of the document is not parallel to
the imaging plane of the camera. It is more common if the mobile phone with the
embedded camera is hand held.
Proposed solution: To resolve this orientation problem the user can use the phone
camera on a tripod stand. This will help position the lens parallel to the text plane before
capturing.

Text Misalignment:
Issue: This misalignment occurs when only a partial text region is captured by the
camera. This will result in sending irregular shapes of characters to the OCR engine. It
will also lead to loss of data being imported to the OCR cloud.
48 | P a g e
Proposed solution: This problem can be addressed if the phone camera is placed
on the tripod stand (as shown in Figure x) and the text document is kept on its base
allowing the camera to capture a panoramic view of the document. The distance between
the camera and the document on the base of the tripod should be sufficient enough to fit
the whole document into one frame. The recommended distance is between 70-80 cm.

Blur (Out of Focus):


Issue: Most digital cameras of the phone have an adaptability to function from
various distances. At longer distances, the images are more likely to get blurred if the text
document is moving, the lens is unclean, the phone is not in a constant position or when
the focus is on the background than the subject (document).
Proposed solution: To avoid the blurring of the images, it is advisable to use a
phone camera that has an auto focus capability so that when the camera is started it will
focus on the text document automatically. In addition to this, the tripod should be used to
ensure the stability of the document and the phone camera.
Internet connectivity:
Issue: An internet connection is required to use the Read2Me application. It is
also important to have an optimum internet speed in order to ensure that the time taken
for uploading image and downloading result is less than the time taken for processing the
document.
Proposed solution: The phone application can be used either in a Wi-Fi
environment or using a 3G. The minimal speed of the internet connection should be
10MBPs.
Input Image Font and Font Size:

Issue: ABBY OCR service cannot produce accurate results if the size of the image
font is too small, that is, less than 5pts. Even a normal 400dpi resolution image with a
very small font size will severely affect the recognition quality. Moreover, ABBYY OCR
cloud service is designed only to read limited font types which will have been discussed
earlier in Section II (subsection H: Technical Approach). Any other font type of the
image will not get recognized by the OCR software and produce null or gibberish result.

49 | P a g e
Proposed solution: If the font size of the text is too small, it is best advised to
increase the resolution of the camera between 500-600dpi.

Figure 19: Stand for the Android Phone

50 | P a g e
G. Technical Approach:
The application intends to send the images captured by the default camera of the phone
(back camera) to the OCR cloud through Internet that will connect the RPi to the OCR cloud
system. The images captured by the camera are stored on the external storage (if no SD card,
then internal storage) of the phone in the Gallery. Once the image is processed, a .txt file is
returned to the phone on which Android TTS is applied.

There were several alternatives for the various software components that will be used for
this application. The team went through the analysis of the possibilities and selected the optimum
option.

i. Selection of OCR Software:


The team considered the following options for OCR on the android device:

Tesseract:
Tesseract is an OCR Engine developed by HP Company in 1985. Nowadays, it is being
improved by Google. It is offered in C/C++ language. To work with tesseract on Android,
the tess-two library needs to be installed first. Once the files for this library are installed, the
API of this engine can be used to get the ocr-ed text. The drawback of this engine is that the
initial accuracy is low (around 50%). However, it can be improved by training it for which
separate training files need to be installed and run. One of the major features of the Tesseract
is multi-language support featured in tesseract 3.02. This implies that an image which
contains text from two different languages can be ocr-ed as well by simply using the
command below which uses a combination of English and Japanese:

tesseract.init(tesseractDatapath, "eng+jpn");

The accuracy of this feature is also low, because Asian language (block characters)
doesn't work well with western languages. Hence "eng+jpn" or "jpn+eng" itself yields bad
result with original tesseract [16]. Also, tesseract works offline which means no internet
connection will be required but that also implies that heavy processing will be done on the
phone itself which will drain the battery of the phone sooner.

51 | P a g e
ABBYY OCR:
ABBYY has a provision of Mobile OCR engine for local image processing as well as
Cloud OCR SDK for image processing on the cloud server. Both of the OCR engines by
ABBYY have a way higher accuracy than Tesseract and requires no training to be able to
recognize different text types. In [17], after running tests for the comparison of recognition
accuracy, it was found out that ABBYY has an accuracy of 95.96% compared to 89.78% of
Tesseract.

After establishing that the team will use OCR services provided by ABBYY, the team
opted for Cloud OCR SDK over the Mobile OCR engine after evaluating the usage scenarios,
attributes, development and deployment methods summarized in Appendix A.

The rest of the ABBYY OCR Cloud features remain the same as discussed in the
previous chapter, for example the text types recognized and the Web API functionality.

ii. Selection of Internet Connectivity:


The team thought of two possible ways internet connectivity can be established on the
mobile:

The phones inbuilt Wi-Fi adapter:

The phones built-in WiFi adaptor can be used to connect to any wireless network but the
speed of the internet connection will depend on the wireless network being used. However, this
option will only be suitable for indoor use or places where the wireless network is available.
Although connecting to the password protected networks will require entering the password for
which might be difficult for the visually impaired to do, the person can ask someone to connect
their phone to the internet.

4G Data:

The user can purchase a data package of the telecom network that he/she is using. The
application will respond depending on the speed of the service purchased. This is a good option
if the user wants to use the application on the move. If the user enters an area where there are no
signals, unfortunately the application will not work.

52 | P a g e
The team decided to go with the mobiles built-in wireless adaptor since it wont cost any
additional expenses.

iii. Selection of TTS Software:


Following options were considered for TTS on Android:

Android TTS:

Android TTS released in version 1.6 of the Android platform. This TTS engine is built-in
in almost all the android devices. The TTS engine that ships with the Android platform supports
a number of languages: English, French, German, Italian and Spanish. Also, depending on which
side of the Atlantic you are on, American and British accents for English are both supported. No
library needs to be installed on Android before using this. A simple Text-To-Speech Object
needs to be created in the java code and its functions can be then used to use the object. A
successful check will be marked by a CHECK_VOICE_DATA_PASS result code, indicating
this device is ready to speak, after the creation of our android.speech.tts.TextToSpeech object. If
not, we need to let the user know to install the data that's required for the device to become a
multi-lingual talking machine. Downloading and installing the data is accomplished by firing off
the ACTION_INSTALL_TTS_DATA intent, which will take the user to Android Market, and
will let her/him initiate the download (this happens only if the Android TTS isnt already
installed on the phone). This TTS engine is free to use but supports only a limited number of
languages.

Google TTS:

Google Text-to-speech powers applications to read the text on your screen aloud. To use
Google Text-to-speech on the Android device, go to Settings > Language & Input > Text-to-
speech output. Select Google Text-to-speech Engine as your preferred engine. It's a library
application which means it doesn't have a user interface for you to interact with; rather, it offers
an interface to other applications to use its functionality. Other apps on the phone can give text to
Google Text-to-Speech for it to speak out loud. Languages supported are; Cantonese, Dutch,
English (India), English (United Kingdom), English (United States), French, German, Hindi,
Italian, Indonesian, Japanese, Korean, Mandarin, Polish, Portuguese (Brazil), Russian, Spanish
(Spain), Spanish (United States), Thai and Turkish.

53 | P a g e
Acapela TTS:

Acapela TTS for Android has been designed for the Android developer community
offering a High Quality speech engine. Acapela TTS for Android are static libraries compatible
with Android versions from Android 2.x to the latest 4.x. Also, Acapela is multilingual and
supports up to 20 languages. However, it is not free and therefore requires the users to use their
API to use their services. Also, it requires an internet connection to work.

Based on the above reasoning, we decided to use the Android TTS. Google TTS is a good
option here too so therefore the user can install it on his/her phone and use that application
instead to read out the text. When the audio file is ready, the user will be prompted to select the
TTS engine desired (if more than one TTS engines are installed on the phone).

iv. Selection of Software for Play/Pause/Forward/Rewind Functionality:

Android has a built-in class for Media Player which can be used by simply importing it
into the java class. (import android.media.MediaPlayer). Once an object is created of this class,
itss functions such as pause and play can be used for pause/play functionality.

For rewind and forward, a seek bar has been used. After getting the current progress of
the seek bar, simple addition and subtraction is done to determine the time for which the audio
file needs to be forwarded or rewound. The team has set a fixed time to seek forward or
backward (in this case 5000). After the calculation, the media player simply seeks to that specific
time in the audio file.

v. Selection of Target Device:

The target device could be any Android device with a minimal requirement of 100MHz
and 16MB of RAM. However, in order to achieve a faster performance in running the
application, we would recommend the minimal specification to be 400MHz and 128M RAM.
The device must have 5MP as the minimal resolution of the camera. For our application, it is
advisable to have the resolution of the captured image between 150dpi to 600dpi. If the
resolution is below 150dpi, some details of the image might get missed during the recognition

54 | P a g e
process. If the resolution is over 500 dpi, the loading and processing image process will take time
without greatly refining the recognition quality.

Other recommended requirements for the camera include:


Disable Flash feature
Manual adjustable feature for aperture control
Auto focus lens
Adjustable Optical zoom
An anti- shake system

The application is compatible with all the API levels 15 (Ice-cream sandwich) to 23 (Lollipop).

55 | P a g e
H. Preliminary Design:
i. Hardware:
The application only requires a smart phone as its hardware.

Item Available in COE store Quantity

Android Smartphone (Samsung Galaxy S4) OR N 1


(HTC One M8)

Table 8: Hardware Components (Android App)

ii. Software:
Software packages needed:

Android Studio (IDE for Android Programming)


Android local TTS
ABBYY OCR SDK Cloud

Android is the most widely used Linux based operating system for smartphone or tablet
devices. Since our product targets a mass audience of visually impaired people, we
prioritized the very popular Android platform instead of Windows, iOS or Blackberry
environment. Also, Androids notable features such as open source platform, multiple screen
for multitasking, custom ROM, and open source libraries for Text-To Speech Synthesis
superseded Android over other OS versions.

iii. Network:
Communication protocols used:

Wireless IEEE 802.11n (For connecting to a wireless network)


All-IP 3G (For Data)

56 | P a g e
iv. System Design:

(a) (b) (c)

(c) (d) (e)

57 | P a g e
Figure 20: (a) The welcome screen of the Read2Me Android Application. (b) Main screen of the application (c) Settings Activity
(d) Processing (e)Result of OCR (f) Text is being read out loud

The Read2Me Android application contains a simple user interface that can be used by
anyone, including the visually impaired. The main menu buttons, as seen in Figure 22b above,
are large enough to cover the entirety of the Android phones screen. The button sizes are
adjusted according to the current phones screen size, so they will have the same proportions on
different screen sizes. The buttons on the main menu not only have a large look to aid the user,
but they also play a sound that tells the user what the buttons purpose is when the button is
pressed once. For example, tapping the language button once will verbally tell the user that it
will change the language to German if the current language is English (Change language to
German), and vice versa. Tapping a button twice will commit the action. Using the previous
example, tapping the language button twice will change the current language of the application,
and verbally inform the user that the language has been changed (Language changed to
German).

Once the camera button has been tapped twice, the user will be informed that a picture is
being taken. After a timer of 5 seconds which will give enough time to the user to position the
camera, the picture is taken and will automatically be uploaded to the ABBYY OCR Cloud,
where it will be converted into text, and then downloaded back onto the Android device. The text
downloaded will then be converted into audio using the Android phones local TTS engine and
then read aloud to the user. There are buttons to control the audio being played out, as seen in
Figure 22c. Rewind, pause, play, and fast forward capabilities are added so that the user can
control the text they are attempting to read.

Two more functionalities; Settings and choosing an image for OCR from the gallery, are
a part of this application. Since both of these functionalities do not apply to the blind people, the
team decided to place these buttons in the menu instead so that the blind person does not come in
contact with these. The Settings icon on the task bar (see Figure 22b) navigates the user to a
login screen (see Figure 22d) where the user needs to enter the username and password for the
OCR cloud services. This will be only done once when the application is installed the first time
on the phone. The user will only be prompted to enter the new id and password when the current
credentials have expired.

58 | P a g e
The photos taken by the application are stored in the phones default photo gallery. The
gallery icon shown in the menu task bar (see Figure 22b) indicates a From File button, once
tapped twice, will open the gallery for the user to the photos in the gallery. The audio files,
however, are not stored and deleted as soon as the user is finished with them. This is because the
audio files use a greater amount of space on the phones memory than the photos being taken.

59 | P a g e
v. System Architecture:
The system architecture comprises of the software structure and hardware components that
constitute the overall working of Read2Me application. The following diagram illustrates an
overall view of the system architecture of the Read2Me application:

Figure21: System Architecture of Read2Me Application

The application workflow is shown in figure 22 below:

60 | P a g e
Figure22: Read2Me Application Workflow

61 | P a g e
I. Preliminary Cost Estimates:
i. System Cost:

Item Quantity Cost (Dhs)

Samsung Galaxy S4 Android 1 700


Phone 16GB

Internet connection 1 0
(assuming wireless)

Cost => 700 Dhs.


Table 9: System Costing (App)

The cost for earphones, SD card and the stand has not been included because they are
optional. The user might decide to use data instead of wireless, the cost of which will depend on
the type of service purchased.

ii. Design Cost:

Item Time/Days

Abbyy OCR Cloud API 14

Android Application 60
Table10: Design Costing (App)

62 | P a g e
IV. Testing

Criteria over which the decision analysis will be carried out include (starting with the highest
priority):
Accuracy
Latency
Usability
Power Consumption
Portability
Weight
Cost

A. Testing:
Both approaches; Read2Me on RPi and Android application were tested for the criteria above
and the results have been summarized below.

The application was installed on Samsung S4 for testing and the pictures shown in Figure x
were taken to read.

Figure23: (a) Picture for German OCR (b) Picture for English OCR

63 | P a g e
For the Raspberry Pi device, we used the following images:

Figure 24 : (a) Picture for English OCR (b) Picture for French OCR

Accuracy:

i. Read2Me on RPi:
The accuracy of RPi in text recognition was 99% for a text image of font size 16. Only a
few letters were inaccurate due to a cropped shutter size of the RPi camera from the top of the
page. The audio file produced 99.9% accurate results in reading. Images with a font size of 14 pt.
produced an accuracy of 60% whereas with font size of 12 pt. the accuracy was 30%. The audio
file in the respective scenarios had an accuracy of 65% and 32% as most of the letters
misrecognized by the OCR engine were misread.

ii. Android Application:

The pictures (German and English) captured were ocr-ed with 99.9% accuracy, with only
2 letters being misread because they were underlined (such as y was read as v) or faded (such as
E was read as F). Afterwards the audio file that was created was 100% accurate, with no words
misread.

The application was also tested for 8pts font. The results were accurate.

64 | P a g e
Latency:

i. Read2Me on RPi:
The different font sizes of the test text gave different results in the time taken to convert the image
into a text file and then into audio. The font used for all sizes is Times New Roman.

o For size 16: 14 seconds


o For size 14: 20 seconds
o For size 12: 27 seconds

These times were measured from the moment the picture began uploading to the ABBYY OCR
Cloud, until the TTS finished synthesizing the resulting text into audio and began playing it aloud. Due to
the quality of the camera being used with the RPi, smaller font sizes take longer to convert into audio, as
they appear more blurry. With the size 12 font, an error would occasionally be given claiming that the
image was unable to be processed.

ii. Android Application:

The application ocr-ed the text (For English) containing 334 words in 12 seconds. The text
was 12pts Times New Roman. For German, the application took 13 seconds containing 327
words with 12 pts font as shown in Figure x. Creating audio file takes about 5 seconds for both
the languages. Therefore, it could be deduced that the total time from capturing the image to
hearing the speech output is 17 seconds.

65 | P a g e
Figure25: OCR output for Android Application (English and German)

Usability:

i. Read2Me on RPi:
The RPi device was encased in plastic to allow the user to easily handle it. The wires used for the
connections on the remote control were collected and wrapped together to prevent the user from being
entangled. The ribbon cable for the camera module was attached to the glasses to hang down from the
side of the users head to keep it away from their face. The buttons on the remote control were
distanced in a way such that the user would not press two buttons at the same time and would be able
to distinguish between each one. The buttons are large enough for the user to feel beneath their
fingers, and do not require too much force to be pushed.

ii. Android Application:

The application has voice labels on buttons as well as confirmations to reassure the blind
person what the application is currently doing, however playing/pausing the audio can be an
issue here since there are no voice labels for that and the blind person will possibly face some
difficulty finding those buttons on the touch screen. Moreover, the blind person can also touch

66 | P a g e
by mistake any of these buttons on the screen but that will be easily detected by the user because
the on touching any of the buttons, corresponding action will be committed.

Power Consumption:

i. Read2Me on RPi:
Below are the screenshots of the TASK Manager before and after running Read2Me

Figure 26: The amount of CPU power being used by the system before running Read2Me is 1%

67 | P a g e
Figure 27: The amount of CPU Power being used while the text is being converted into audio is 26%.

Overall, the CPU usage during text recognition and conversion process is 26% and it occupies around
3MB of the memory. Thus using the Read2Me RPi system is efficient and economical.

ii. Android Application:


The application takes 10.39 MB of memory when installed. To check how much battery, the
application uses up, the team used the Smart Manager pre-installed in the Samsung S4. Below
are the screenshots from the Smart Manager which shows that the application when used only
twice takes up 1% of battery and a RAM of 33.17 MB and CPU usage of 3.32% as seen in
Figure x.

68 | P a g e
Figure 28: Power and RAM Consumption For Read2Me Application

Portability:

i. Read2Me on RPi:
The RPi device requires the glasses, the encased device, and the remote control to be carried
around. The ribbon cable is flexible and can be rolled up. The glasses can be folded without
damaging the camera. The remote control used for capturing images is thin enough to be
carried without taking too much space.

The dimensions of the Raspberry Pi are 85.60mm x 56mm x 21mm and the remote control
measures 5.5 cm x 8.5 cm which makes the whole system very portable and easy to carry and
operate.

ii. Android Application:


The application requires only a phone with an internet connection which is very much portable.

Weight:

i. Read2Me on RPi:
The Raspberry Pi weighs around 65g (including the case) and the breadboard weighs only 35g.
Therefore, the total weight of the system is approximately 100g.

ii. Android Application:

69 | P a g e
The phone used for testing was Samsung S4 which only weighs 130 g.

Cost:
A free trial of OCR service was used therefore no cost for that has been considered.
Moreover, the cost for the internet connection has also not been considered because
AUS_Wireless was used for testing. If OCR service and the internet connection was
purchased, the cost for using them in both of the approaches will be the same and hence can
be eliminated for the purpose of comparing.

i. Read2Me on RPi:
The Raspberry Pi 2 Model B costs 209 AED. The camera module, sold separately, can be
bought online for 175 AED. Glasses and headphones, if not already owned, can be bought
anywhere at any price depending on the quality. The battery pack used, without the batteries,
costs about 116 AED. The case for the Raspberry Pi costs 39 AED. The five push buttons used
for the remote control cost 50 AED altogether. The USB wireless adapter used to internet
connectivity costs 57 AED.
The total cost of this system is approximately 646 AED.
ii. Android Application:

The cost of Samsung S4 varies depending on where it is purchased from but it will be ranging from 700-
900 AED.

70 | P a g e
V. Comparison between Two Approaches

Decision Analysis between the two approaches was carried out based on the testing described
in Section VI.

Android App Vs. RPi


12
10
9.8 9.5
8 9 9
8.5
9 9
Weight

8 8 8
6 7
6.5
7

4 5

2
0

*Value Index:
10- Best
Attributes
1- Worst
Read2Me- Android App Read2Me- On Rpi

Figure 29: Analysis chart between Read2Me Android Application and RPi

Based on the above design analysis and criteria, it can be deduced that both technologies
have their pros and cons, however Read2Me on RPi is more user-friendly for blind. The
accuracy of RPi is equivalent to applications provided that the font size if 16 pt. or more,
therefore the team deems the approach on RPi better than the android application.

71 | P a g e
VI. Project Management

A. Preliminary Schedule:
Task Name Duration Start Finish Predecessors Resource Names
Review of COE 490 and
4 days Sun 9/6/15 Wed 9/9/15
Planning for COE 491
Review of COE 490 1 day Sun 9/6/15 Sun 9/6/15 Anza,Ragini,Heba
Discussion of the
2 days Mon 9/7/15 Tue 9/8/15 Anza,Ragini,Heba
progress done so far
Studying the OCR Cloud
API for android 2 days Tue 9/8/15 Wed 9/9/15 Heba,Ragini
application
Studying the OCR Cloud
2 days Mon 9/7/15 Tue 9/8/15 Anza
API for Rpi
Final Report Work and
72 days Sun 9/6/15 Mon 12/14/15
Poster
Divide the Final Report
Work and decide on the 1 day Sun 9/13/15 Sun 9/13/15 2,3 Anza,Ragini,Heba
format of the report
Design the poster for the Thu
14 days Tue 11/3/15 Heba
senior Design Competition 10/15/15
Assign responsibility of
1 day Sun 9/6/15 Sun 9/6/15 Anza,Ragini,Heba
weekly logs
First draft submitted to
7 days Sun 9/27/15 Mon 10/5/15 7 Anza,Ragini,Heba
advisor
Revised and modified Mon
2 days Tue 10/27/15 10 Anza,Ragini,Heba
report 10/26/15
Second draft of report Wed
1 day Wed 10/28/15 11 Anza,Ragini,Heba
submitted 10/28/15
Submit final draft to Sun
5 days Thu 12/3/15 12 Anza,Ragini,Heba
adviser and examiners 11/29/15
Review the poster, Sun
2 days Mon 12/14/15 13 Anza,Ragini,Heba
finalize and print it 12/13/15
Implementation 61 days Sun 9/6/15 Sun 11/29/15
Implement OCR on Rpi
14 days Sun 9/6/15 Wed 9/23/15 4,5 Anza,Ragini
and Android
Implement TTS on Rpi
11 days Tue 9/29/15 Tue 10/13/15 16 Anza
and Android
Implement the Wed
3 days Sun 10/18/15 17 Ragini
Play/Pause/Replay 10/14/15

72 | P a g e
functionality on Rpi and
Android
Add language option Sun
2 days Mon 10/19/15 17 Anza
(French) on Rpi 10/18/15
Add voice guidance on Wed
3 days Sun 10/11/15 16 Anza
Android app 10/7/15
Add credentials
Thu
changing functionality to 1 day Thu 10/15/15 20 Anza
10/15/15
android app
Ask Mechanical Eng
Department to
1 day Tue 9/29/15 Tue 9/29/15 Anza,Ragini,Heba
manufacture a stand for
the system
Test the system by Sun
1 day Sun 11/29/15 21 Anza,Ragini,Heba
visiting the Al-Thiqah club 11/29/15
Tue
Presentation Preparation 11 days Tue 12/29/15
12/15/15
Prepare presentation Tue
6 days Tue 12/22/15 13,14 Anza,Ragini,Heba
slides 12/15/15
Wed
Rehearse parts 4 days Mon 12/28/15 25 Anza,Ragini,Heba
12/23/15
Tue
Presentation Day 1 day Tue 12/29/15 26 Anza,Ragini,Heba
12/29/15
Table 11: The overall schedule of the project

73 | P a g e
B. Gantt Chart:

Figure 30: Gantt Chart

74 | P a g e
Figure 31: Gantt Chart (continued)

75 | P a g e
VII. Standards

Following the international standards is a key element in ensuring the safety and quality of any
project or product. Since our proposed project deals with the communication of different devices,
we will be using the standards related to system engineering, which is called the ISO/IEC
standards.
A. ISO/IEC JTC 1/SC 31 - Automatic identification and data capture techniques

ISO 1073-1:1976: Alphanumeric character sets for optical recognition -- Part 1:


Character set OCR-A -- Shapes and dimensions of the printed image

Describes the forms of printed images and the sizes of alphanumeric characters as
well as the signs and graphical symbols (OCR-A) intended for optical character
reading according to ISO 646-1973.

ISO 1073-2:1976: Alphanumeric character sets for optical recognition -- Part 2:


Character set OCR-B -- Shapes and dimensions of the printed image

Indicates the forms of printed images and the sizes of alphanumeric characters as
well as the sings and graphical symbols (OCR-B-character set) intended for
optical character reading according to ISO 646-1973

B. ISO/IEC 25010:2011- Systems and software engineering -- Systems and software Quality
Requirements and Evaluation (SQuaRE) -- System and software quality models

Defines a quality in use model composed of five characteristics (some of which


are further subdivided into sub-characteristics) that relate to the outcome of
interaction when a product is used in a particular context of use. This system
model is applicable to the complete human-computer system, including both
computer systems in use and software products in use.

C. IEEE 802.11n- IEEE Standard for Information technology-- Local and metropolitan area
networks-- Specific requirements

IEEE 802.11 n Wi-Fi / WLAN standard uses technologies including OFDM and
MIMO to enable it to provide high speed data transport at 600 Mbps peak.

76 | P a g e
VIII. Societal Impact
The OCR technology is rapidly evolving in being an instrumental part of our everyday lives.
Even though the application of OCR technology falls in various different categories such
business, teaching, and medicine, the most effective and efficient application of it can be done
for the disabled. In this context, Read2Me collaborates the OCR technology with a speech
synthesis tool to make reading an easy task for the visually impaired. It eliminates the need to
learn Braille, which might take years to learn fluently for blind individuals. The goal of this
project is to make the best use of the available technology in order to alleviate any difficulties
from the lives of these people. Read2Me will significantly speed up the reading process using
OCR without having to manually transfer the text script from an image.

Our proposed project is set to give greater independence to visually impaired by not only
allowing them to read text of their own choice, but also to identify business cards, read menus or
labels, or directions on the board. Another advantage of this device is that the user does not have
to install any additional hardware or software, and they can start reading the document anywhere
anytime.

To make our system more scalable, our proposed design approches suggest the use of a
standalone platform such as a stand where the camera of the RPi or the mobile phone itself can
be placed to take images. Therefore, the system can extend its usability to a wide range of people
besides the visually impaired. The product can be used as a literacy support for people who are
learning to read or who cannot read, such as small kids and even dyslexic patients.

All in all, Read2Me can serve as a complete, robust package to enhance the lives of the
visually impaired people to a great extent.

77 | P a g e
IX. Conclusion

Assistive technologies have been rapidly evolving and it is a major step in aiding the
blind and visually impaired (BVI) in educational preparation for work, and in employment. The
use of these technologies has helped the BVI to access the information that was previously out of
their reach. There have been various solutions and improvements in the area of assisting the
blind to read however the technology has been limited to braille which requires the blind to learn
braille. Other technologies that have eliminated the need to learn braille have so far only been
limited to research and their functionality is restricted to reading only. Our proposed project is
set to give greater independence to visually impaired by not only allowing them to read books of
their own choice but to have identity business cards, read menus or labels or directions on the
board as long as they are in English.

78 | P a g e
X. Future Prospects
MOTION DETECTION
Read2Me could be scaled to include a Passive Infrared (PIR) motion sensor to detect motion
from pets/humanoids from about 20 feet away. This could be helpful for the blind because it
would give them the confidence of knowing if theres anyone within 20 feet away and it would
be an indication for the blind to be careful while walking.

SECURITY
Read2Me could be made secure by using the fingerprint sensor. Security might be of interest to
some if they desire to keep the GPS locations private.

DISTANCE SENSORS
The product can use Infrared (IR) distance sensors, also known as IR break beam sensors, to
determine how close the nearest object is (for over 1m distance). This will also boost the blinds
confidence and would alert them if they are about to approach any object.

SIRI-LIKE APPLICATION
The Raspberry Pi can have a Siri-like application which will allow the user to communicate with
the glasses. To implement this application, the RPi needs a listener/talker pair to develop a voice
user interface (VUI). We decided not to implement this because we wanted to limit the scope of
our project by eliminating the VUI.

SHARING FACILITY
The user could also have the ad-on feature to share the book that he/she is reading i.e. the audio
output of the glasses, with any other user possessing the same ear piece and present within the
same wireless network. For this, the Raspberry Pi must be connected to the internet or the
Bluetooth, as well as the other ear pieces that are expected to receive the audio. This sharing
facility could allow the blind person in the possession of a Read2Me glasses share the book
he/she is reading with any other user who has a wireless earpiece and is within the wireless
vicinity.

79 | P a g e
XI. Appendix A
How secured is Read2Me RPi?
Although the project doesnt involve anything that needs security however we considered
that since internet is involved, some kind of security must be integrated. The Edimax Wi-Fi
adapter supports 64/128 bit WEP Encryption and WPA-PSK, WPA2-PSK and WPS wireless
security. Furthermore, Abbyy Cloud OCR SDK authenticates the users before allowing them
to gain access to their cloud services. This is done through providing the username and the
password in the Web API. Abbyy supports Secure Socket Layer (SSL) encryption, and
advices to use https:// instead of http:// in all calls. This way they make sure all images and
recognition results travel encrypted via the network.

Image Formats supported by ABBYY Cloud OCR:

Format Extension
BMP: bmp
uncompressed black and white
4- and 8-bit uncompressed Palette
16-bit uncompressed, uncompressed Mask
24-bit uncompressed
32-bit uncompressed, uncompressed Mask
BMP: bmp
4- and 8-bit RLE compressed Palette
DCX: dcx
black and white
2-, 4- and 8-bit palette
24-bit color
PCX: pcx
black and white
2-, 4- and 8-bit palette
24-bit color

PNG: png
black and white, gray, color

80 | P a g e
JPEG 2000: jp2, jpc
gray Part 1
color Part 1
JPEG: jpg, jpeg, jfif
gray, color
PDF (Version 1.7 or earlier) pdf
TIFF: tif, tiff
black and white uncompressed, CCITT3,
CCITT4, Packbits, ZIP, LZW
gray uncompressed, Packbits, JPEG, ZIP,
LZW
24-bit color uncompressed, JPEG, ZIP, LZW
1-, 4-, 8-bit palette uncompressed,
Packbits, ZIP, LZW
(including multipage TIFF)

TIFF: tif, tiff


black and white CCITT3FAX
GIF: gif
black and white LZW-compressed
2-, 3-, 4-, 5-, 6-, 7-, 8-bit palette LZW-
compressed
DjVu: djvu, djv
black and white, gray, color
JBIG2: jb2
black and white
Table 12: Image formats supported by Abbyy [14]

Evaluation between ABBYY cloud OCR SDK and ABBYY Mobile OCR Engine
Usage Cases:

Cloud OCR SDK On Device OCR with ABBY


Target Audience Mobile developers who need to Mobile developers, hardware
integrate OCR as a service manufacturers
Integration High level integration via Low level integration using
RESTful web service the local API or wrappers
Internet Connection Required Not required
Processing Out of the System In the system(Synchronous)
(Asynchronous) Upload, process, receive
Upload image, processing and result
receive results
Scalability Since cloud based it is Sequential processing on
ABBYYs responsibility to mobile devices. Local engine

81 | P a g e
manage processing power and cannot be scaled up
ensure good processing speed. indefinitely, limited
Processing capacity is virtually processing speed.
unlimited.
Security HTTPS, Microsoft Azure Customized
infrastructure
Table 13: Usage cases between Cloud OCR SDK and Mobile OCR Engine[18]

Attributes:
Cloud OCR SDK On Device OCR with
ABBY
OCR text Recognition Yes Yes
ICR(Intelligent Character Yes No
Recognition)
OCR language Over 200 recognition languages Only 62 languages
support
Business Card reading 27 languages 21 languages
Historic Font OCR Yes No
GUI essentials Not provided, only processing Not provided, only
processing
Export format post processing TXT, XML, ALTO Results are provided as a
XML, Doc(X), ODT, XML(X), structure in only plain text
PPT(X)
PDF, PDF/A
Table 14: Attributes between Cloud OCR SDK and Mobile OCR Engine

Development & Deployment:


Cloud OCR SDK On Device OCR with
ABBY
Trial Online subscription Trial Software license
agreement to be signed
Development Tools All development tools integral Only the native OS tools
for a web service are provided
RAM consumption Low, only for sending image Depends on the OS and
and retrieving results recognition language. For
most languages it requires
up to 15MB-30MB
Operating Systems No restrictions. Only network Android, iOS, Windows,
connection and RESTful API Symbian
for web service calls
Application Size Minimal size is small OCR engine ,libraries and
dictionaries add up to 20
MB of the application
Table 15: Development and Deployment between Cloud OCR SDK and Mobile OCR Engine

82 | P a g e
Business Model
Cloud OCR SDK On Device OCR with
ABBY
Payment As you go via subscription Developer licensing
Maintenance Cost Not needed, ABBYY service is Yes required to implement
always up to date new technology version
RAM consumption Low, only for sending image Depends on the OS and
and retrieving results recognition language. For
most languages it requires
up to 15MB-30MB
Operating Systems No restrictions. Only network Android, iOS, Windows,
connection and RESTful API Symbian
for web service calls
Application Size Minimal size is small OCR engine, libraries and
dictionaries add up to 20
MB of the application
Table 16: Business Model between Cloud OCR SDK and Mobile OCR Engine

83 | P a g e
XII. Appendix B

COMPONENT LEVEL SPECIFICATION

Table 17: Specifications of Raspberry Pi 2 Model B

84 | P a g e
Specifications of Raspberry Pi Camera Module

Photo Resolution 5 MegaPixel (2592 x 1944 )


A photo taken with the camera module will be around 2.4MB.
File Size
This is about 425 photos per GB.
Lens 5M
Aperture 2.9
Focal Length 2.9mm
Power Operates at 250mA
Usage Connect the ribbon from the module to CSI port of Raspberry Pi
Video Resolution 1080p30
JPEG, PNG, GIF, BMP, Uncompressed YUV, uncompressed RGB
Picture Format Supported
photos

Table 18: Specifications of Raspberry Pi Camera Module

85 | P a g e
XIII. Glossary
TTS
OCR
RPi
RESTful
Assistive technology
Image Binarization
NTSC
DV Stream
DCT feature
Levenshtien distance
Sobel edge count detector
CIColour control
CIColorMonochrome

86 | P a g e
XIV. Bibliography

[1] World health organization official website. August 2014. [Online]. Available at
http://www.who.int/mediacentre/factsheets/fs282/en. Accessed on March 4, 2015.

[2] P. Patil, S. Solat, S. Hake and P. KHOT, 'Camera Based Product Information Reading For
Blind People', International Journal Of Engineering And Computer Science, vol. 4, no. 3, pp.
11072-11075, 2015.

[3] G. Vasanthi and Y. Ramesh Babu, 'Vision Based Assistive System for Label Detection
with Voice Output', International Journal of Innovative Research in Science, Engineering and
Technology, vol. 3, no. 1, pp. 546-549, 2014.

[4] M. Krishnaiah, B. Sandhya, Portable Camera-Based Assistive Text Reading and Human
or Vehicle Detection, International Journal of Electrical Electronics and Communication, vol.
18, no. 6, pp. 6441 6445, August 2015.

[5] R.Shilkrot.,&P.Maes.(2014,May.1).FingerReader: A wearable device to support text


reading on the go.[Online]. Available:
http://fluid.media.mit.edu/sites/default/files/paper317.pdf

[6] Joshi Kumar, A.V., T. MadhanPrabhu, and S. Mohan Raj. A pragmatic approach to aid
visually impaired people in reading, visualizing and understanding textual contents with an
automatic electronic pen'. AMR 433-440 (2012): 5287-5292. Web. 4 Apr. 2015.

[7] R. Keefer., & N. Bourbakis. 'Interaction with a Mobile Reader for the Visually Impaired'.
21st IEEE International Conference with Artificial Intelligence Tools. 18.03 (2009): 229-236.
Web.

[8] M. Jeon, A. Ayala-Acevado, N. Nazneen, B. Walker, O. Akanser, Listen2dRoom: Helping


Blind Individuals Understand Room Layouts in CHI '12 Extended Abstracts on Human
Factors in Computing Systems, Austin, TX, U.S.A., 2012, pp. 1577 1582.

[9] Zhou, S.Z., Open Source OCR Framework Using Mobile Devices, Multimedia on Mobile
Devices 2008. Edited by Creutzburg, Reiner; Takala, Jarmo H. Proceedings of the SPIE,
Volume 6821, article id. 682104, 6 pp. (2008)

[10] R.Neto. &N.Fonseca. Camera Reading for Blind People. Volume 11, 11.11 (2014)
1200-1209.[Online].Available at
http://www.sciencedirect.com/science/article/pii/S2212017314003624

87 | P a g e
[11] T. others, 'USB Battery Pack for Raspberry Pi - 4400mAh - 5V @ 1A ID: 1565 - $24.95 :
Adafruit Industries, Unique & fun DIY electronics and kits', Adafruit.com, 2015. [Online].
Available: http://www.adafruit.com/products/1565. [Accessed: 04- May- 2015].

[12] T. Klosowski. (2013, Nov. 7).How to Pick the Right Electronics Board for Your DIY
Project. [Online]. Available: http://lifehacker.com/how-to-pick-the-right-electronics-board-for-
your-diy-pr-742869540

[13] Maker Shed, 'Raspberry Pi Comparison Chart', 2015. [Online]. Available:


http://www.makershed.com/pages/raspberry-pi-comparison-chart. [Accessed: 10- Oct- 2015].

[14] "Abbyy Mobile OCR engine," 2015. [Online]. Available: http://www.abbyy.com/mobile-


ocr/OCR-stages/.

[15] Abbyy.technology, 'Supported OCR Text/Print Types [Technology Portal]', 2015. [Online].
Available: https://abbyy.technology/en:features:ocr:supported_ocr_text_types. [Accessed: 23- Oct-
2015].

[16] GitHub, 'Has anyone tried the multi-language support featured in tesseract 3.02? Issue #28
rmtheis/tess-two', 2013. [Online]. Available: https://github.com/rmtheis/tess-two/issues/28.
[Accessed: 23- Oct- 2015].

[17] 'Smart Implementation of Text Recognition (OCR) for Smart Mobile Devices', The First
International Conference on Intelligent Systems and Applications, pp. 19-24, 2012.

[18] Abbyy.technology, 'Cloud vs. On Device OCR Processing for Mobile Applications [Technology
Portal]', 2015. [Online]. Available: https://abbyy.technology/en:comparisons:cloud_vs_mobile-on-
device_ocr. [Accessed: 23- Oct- 2015].

For Rpi:

- the picture of the final product

-blind person with the stand

-output

-video

88 | P a g e
-should we mention about internet speed anywhere? And a general calculation like

The AUS LAN has a speed of 30 Mbps which means an image of 2.7 MB (an
approximate size of an image taken from a 5MP camera) will be sent to the cloud in
about:

2.781106
= = 0.72
301106

Should this be there?

For App:

-screenshots with different font pictures

- we have to assume the dimensions and the paper sizes!!

89 | P a g e

You might also like