You are on page 1of 33

Image Hunter

1. Problem Identification Phase


1.1 Identification of Five Problems
1.1.1 Near Duplicate Image recognition and Content Based Image recognition:
With the ease of image distribution made possible by the success of the Internet, issues associated with copy-right infringement and content pirating have become increasingly important, near-duplicate image recognition can be used as an alternative to traditional watermarking for copy-righted image content protection. In search engines (Like Google etc) image searching is done based on what image is tagged with, but, not on the semantics of the image. If an image is wrongly tagged than, the searching gives a wrong result. Hence the idea of Content Based Image Recognition is needed in search engines. These can be achieved by extracting the features and semantics of the image and matching those using suitable methods.

1.1.2 Novel Algorithm for Traffic Signaling at cross roads:


The traffic signaling system we are having at present, works on the timesharing model. The paths are opened only for certain intervals. In this approach the concentration of vehicles is not taken into account. This may result in busy waiting at some end and starvation at the other end. Hence an algorithm can be designed to take care of the concentration of vehicles and to limit the time span of waiting of vehicles. This is beneficial where multiple roads lead to a single way and in cross roads.

1.1.3 Preferential Electronic Voting Machine:


India is the first country to introduce EVM in parliamentary polls. But till now no EVM is designed for Vidhana Parishat and Rajya Sabha elections. This is because; these elections follow the preferential system of voting. Every individual has to cast his priority vote to every candidate. Hence the idea here is to DESIGN A EVM FOR PREFERENTIAL VOTING. 1

Image Hunter

1.1.4 Trespass Detection of war planes:


We have mechanisms for detecting the tress passing of enemy war planes. (Bombers like MIG, Jaguar etc). But these use Radar mechanisms. These work on Doppler Effects.Nowadays some aircrafts are designed such that they deflect the radio waves sent by radar in opposite direction hence untracable by the radar. So the problem is to design a system that can be used to identify such tresspassing. This system should take care to see that normal objects (Group of birds etc) are not to considered as tresspassers.

1.1.5 Controlling IP Spoofing through Inter-Domain Packet Filters


Scope of this project: IP spoofing is most frequently used in denial-of-service attacks. Packet filtering is one defense against IP spoofing attacks. In this project we are using Border gateway protocol and interdomain packet filter to defense the IP Spoofing. Introduction: Ddistributed Denial-of-Service (DDoS) attacks pose an increasingly grave threat to the Internet, as evident in recent DDoS attacks mounted on both popular Internet sites and the Internet infrastructure. Alarmingly, DDoS attacks are observed on a daily basis on most of the large backbone networks. One of the factors that complicate the mechanisms for policing such attacks is IP spoofing, which is the act of forging the source addresses in IP packets. By masquerading as a different host, an attacker can hide its true identity and location, rendering source based packet filtering less effective. It has been shown that a large part of the Internet is vulnerable to IP spoofing Modules Constructing Routing Table Finding feasible path Constructing Inter-Domain Packet Filters Receiving the valid packets

Image Hunter

1.2 Definition of finalized problem with justification for choice.


In the first round of elimination of 2 project ideas, trespass detection of war planes and IP spoofing were eliminated. This is justified below. Trespass detection of war planes and IP spoofing are rejected in the first slot. Trespass detection is rejected as the project is to be scaled to a large extent and only simulation is possible. IP spoofing is rejected as the implementation of the project is not practical and again only simulation is possible. In the second round, content based image retrieval was retained. Preferential Electronic Voting machine and Novel algorithm for traffic signaling was left out. EVM was not feasible on Microcontroller 8052 and there would be large learning overhead. In case of signaling algorithm, the deployment was the problem and simulation results and actual results were believed to be non matching conditions.

1.3 Introduction
1.3.1 Introduction:
The variety of environmental data (text, diagrams, images, videos, and other types of measurements), together with the fact that they are stored in worldwide distributed databases, creates new challenges when attempting to retrieve relevant information. Furthermore, this information is usually attached to time and space.This project focuses on using both LOCAL and GLOBAL features of the image to retrieve relevant information.Also to detect piracy by its application

1.3.2 Problem Definition:


To develop a system through which It is possible to find the near duplicate images given which we have a original image.It is possible to search for a content related image in a database by supplying a sample image.

1.3.3 Objectives (features) of the project:


There are six main objectives set for the project development. They are described as below

Image Hunter

1.3.3.1 New feature extraction method that simultaneously captures the global and local characteristics of an image by adaptively computing hierarchical geometric centroids of the image. 1.3.3.2Since relevant information is retrieved from DB searches the searching time is saved tremendously. 1.3.3.3 Ranked results provide clear idea about the gap between the contentsearched and the searched result(Currently GOOGLE search engine uses the PAGE RANK ALGORITHM, but the semantic gap between the images cannot be found out) 1.3.3.4To solve issues associated with copy-right infringement and content pirating through Near Duplicate Image Detection 1.3.3.5 This can be applied in forensic sciences in which it is possible to detect a culprit based on the content in live footage. 1.3.3.6 Finds its application in semantic image analysis in medical diagnosis

Image Hunter

2. System Study
2.1 Existing System (Advantages and Disadvantages of existing System):
The system to be designed can be thought of as an alternate to existing system and also a new self contained product. Content based image retrieval is the alternate for the tag based searching. Near duplicate search for searching morphed images is new self contained product for identifying the morphed images that can help in stopping piracy and copyright infringement.

2.2 Proposed System:


The intended system shall be able to identify the Content related images in the database. Here the retrieved images should be content related to the query image. Also, the intended system shall be able to find out the Near duplicate set of images from a database with reference to a query image.

2.3 Advantages of proposed system:


2.3.1 New feature extraction method that simultaneously captures the global and local characteristics of an image by adaptively computing hierarchical geometric centroids of the image. 2.3.2 Since relevant information is retrieved from DB searches the searching time is saved tremendously. 2.3.3 Ranked results provide clear idea about the gap between the contentsearched and the searchedresult(Currently GOOGLE search engine uses the PAGE RANK ALGORITHM, but the semantic gap between the images cannot be found out) 2.3.4 To solve issues associated with copy-right infringement and content pirating through Near Duplicate Image Detection 2.3.5 This can be applied in forensic sciences in which it is possible to detect a culprit based on the content in live footage. 2.3.6 Finds its application in semantic image analysis in medical diagnosis 2.3.7 The advantage is that, the rank will never change until the content of image changes (Unlike, page rank where ranking changes in short intervals). 5

Image Hunter

2.4 Feasibility Study


There is no hardware components involved in the current scope of the project and also software and IDEs used are open source and no cost on that is incurred. System study and Requirement Collection 50 hours System Design and design optimization 70 hours

2.5 Constraints:
2.5.1 Images are sensitive to illumination as there features change due to illumination. 2.5.2Features of the image, both LOCAL and GLOBAL are having their own weaknesses. 2.5.3The results obtained are approximate not exact. 2.5.4These systems usually fail when the contrast of image is changed.

Image Hunter

3. Software Requirement Specification


3.1 Introduction
3.1.1 Purpose
The intended system shall be able to identify the Content related images in the database. Here the retrieved images should be content related to the query image. Also, the intended system shall be able to find out the Near duplicate set of images from a database with reference to a query image.

3.1.2 Scope of Project


The system to be designed searches for the image based on semantics of the queried image and the results are ranked based on the conceptual distance with queried image. In case of the near duplicate search, the morphed images are identified. The results are then ranked with conceptual distance. This ranking can be alternative to the traditional page ranking of image (done on the number of visits). The advantage is that, the rank will never change until the content of image changes (Unlike, page rank where ranking changes in short intervals).

3.1.3 Intended Audience


Since this a project about a college level, the intended audience are as follows. 1. Project Guide 2. Evaluation committee 3. Peers 1. 2. Project Guide can use this document for monitoring the progress in the development of the project and set guidelines according to it. Evaluation Committee can use this document to map the requirements quoted and the detailed design document. Also check for the completeness and unambiguous requirements. 3. Peers can use this document this to suggest any changes needed and give reviews on the project.

Image Hunter

3.1.4 References:
[2] Near-Duplicate Image Recognition and Content-based Image Retrieval using Adaptive Hierarchical Geometric Centroids Mai Yang, Guoping Qiu, Jiwu Huang and Dave Elliman, The 18th International Conference on Pattern Recognition 2006 [3]Image Retrieval: Ideas, Influences, and Trends of the New Age RITENDRA DATTA, DHIRAJ JOSHI, JIA LI, and JAMES Z. WANG [4]Image Retrieval using Shape Feature S.Arivazhagan, L.Ganesan, S.Selvanidhyananthan [5]Region Based Image Similarity Search - Prof. Dr. H.-J. Schek

3.2 Overall description


3.2.1 Product Perspective
The system to be designed can be thought of as an alternate to existing system and also a new self contained product. i) Content based image retrieval is the alternate for the tag based searching ii) Near duplicate search for searching morphed images is new self contained product for identifying the morphed images that can help in stopping piracy and copyright infringement.

3.2.2 User Classes and Characteristics


There are two user classes intended to use this system. They are 3.2.2.1 Administrator: Administrator is the one who populates the database. That means, administrator is the only privileged person who can upload new images to the database. 3.2.2.2 User: User is the one who queries the database. User is provided with 2 choices. User can either choose Content based search or Near duplicate search. In both the cases, user uploads a query image to get the ranked results.

Image Hunter

3.2.3 Operating Environment


Hardware requirements: Intel Dual core or higher end processor (2.0 Ghz) 1 GB RAM Software requirements: Windows XP (SP2) or higher versions. MySQl as Database System. Java Run Time Environment. JDK 1.6 and Netbeans IDE WAMP server.

3.2.4 Design and Implementation Constraints


At present the system concentrates on RGB (Red, Green and Blue) colour space. For future enhancements some other colour spaces like HSI and others can be employed. The system fails to retain the image features when contrast of that image changes. In future this constraint can be taken care of. The system works only with one type of image in current scope (.jpeg or .gif or .png)

3.2.5 Assumptions and dependencies


The algorithm proposed for designing this system is Adaptive Geometric Centroid Algorithm which works on the concept of centroids of matrices. The algorithm assumes the following 3.2.5.1 Centroids are invariant to scaling of image. 3.2.5.2 Centroids are also invariant to the illumination of the image. Since the system to be designed is an independent entity, there are no dependencies.

Image Hunter

3.3 Requirement Specification


3.3.1 Functional Requirements
3.3.1.1 Admin shall be able to populate the database of the system by uploading one image at a time: Admin, who is authenticated using alphanumeric password authentication, is given a privilege to upload an image. On uploading, the image, the mathematical equivalent (will be discussed in design document) of that image will be stored in the system database. No other users except Admin are able to populate this database. 3.3.1.2 Admin shall be able to login into the system using alphanumeric password system: This is the security requirement. Since database is a shared entity, only Admin can update that. Hence authentication is needed. 3.3.1.3 User (not authenticated) can query the database. This has 2 options, i. Content based searching ii. Near duplicate searching 3.3.1.3.1 Content Based Searching: In this, user is able to query the database for the content related images. 3.3.1.3.2 Near duplicate searching: In this, user is able to search for the near duplicate images of the query image. 3.3.1.4 User shall be able to enter the attributes for searching: User should be able to specify the attributes of searching. The attribute set is common for both kinds of searching. The attribute list is as given below. 3.3.1.4.1 Specify the range of ranks: Since the result set will be ranked on conceptual distance of searched image and the queried image, user shall be able to specify only the range of ranks in which he/she is interested in. 3.3.1.5 In case of content based searching, user can prioritize the searching by choosing the factor that balances the colour and Centroid features of the image. This can be implemented as a scale from 0-1, where 0 means colour feature is not included. 10

Image Hunter

3.3.1.6 User shall be able to see the ranked result: Once the database is queried with respect to a image in both types of searching, the results are displayed to the users based on the ranking of the image. More the distance lower is the ranking.

3.4 Nonfunctional Requirements


3.4.1 Performance requirements
There are 3 basic performance requirements for this project. They are: i) Response Time: The time delay user input and the output should not exceed 5 seconds. This means the model should be built with concurrency between subsystems. ii) Recall: This is one of the measure to find out the efficiency of the algorithm that is employed in any of the image processing techniques. It is the ratio of retrieved results to the number of relevant items. This ratio should be more than 85% for the proposed system. iii) Precision: This is the second method of finding out the efficiency of the algorithm in the image processing technique. It is the ratio of retrieved result to the number of relevant results in the set extracted at a time. Even this performance rating should be more than 85% for the proposed system.

3.4.2 Safety requirements


Safety requirements include those requirements that include safety of human operators, business systems and environment hazards. Since the proposed system does not include of any of such requirements, safety requirements does not apply to this proposed system.

3.4.3 Security Requirements


Security requirements include those requirements where a system or a part of system is restricted for access. The access means authentic use of the system. Hence the permission to populate the database of image that is given to admin should be authenticated using an alphanumeric password system. Hence the non functional requirement can be stated as user shall be able to log into the system using alphanumeric password system 11

Image Hunter

3.4.4 Software Quality Attributes


Functionality
Since the intended project runs on the single machine on which the image database is present, there is no dependency on network attributes. Hence the the project will be functional all the time the machine on which it is running is up

Performance
Performance is measured by recall and precision values which will not be less than 85%.

Security
There are 2 users for the system. General user is unauthenticated as there is no privilege to alter the database. As admin populates the database, Admin is always authenticated.

Availability
System is available through on a single system on which it is installed, provided there is also the availability of the database system.

Usability
This system finds it use in medical applications, forensic science and the search engines.

Interoperability
This is independent software. This can be made to operate with the database of similar applications.

3.4.5 User Documentation


Evaluation committee can use this document in evaluation of the project with respect to all the requirements stated. 12

Image Hunter

The document can be used by project guide to guide about the progress of the project and also provide guidelines about the suitable changes to be made in later part of project development.

Peers can use this document to test the project for the requirements stated. Suggest any improvements.

3.5 External Interface Requirements


3.5.1 User Interfaces
A simple graphical user interface is needed for the implementation of this project. The logical views of the user interfaces are given below. 3.5.1.1 Administrator Login: This page has text boxes for user to enter the user name and login. On successful login, admin can populate database. 3.5.1.2 Working interface for admin: Here, admin will browse and of the image from the hard drive and upload it to the database. On doing so, the database will be populate with the image features. 3.5.1.3 User interface for uploading image: This is the area where user can browse image in the hard drive for either content based or near duplicate searching. On uploading image and submitting, users should be taken to one more interface given below. 3.5.1.4 User interface for attributes: In this interface, user can set the attributes of searching. The attributes that are to be set by the user are i. Tuning Factor ii. Range of rank ii. Conceptual distance. 3.5.1.5 Result interface: This is the final screen where user can get the resultant images ranked on the distance. This screen contains result of either content related or near duplicate search images.

3.5.2 Hardware Interfaces


13

Image Hunter

This project mainly focuses on the software part of CBIR and NDIR. Hence the hardware interfaces is not applicable. In future, this can be extended to the hardware domain where in the search result appears on the live stream for a video camera.

3.5.3 Software Interfaces


The only software interface required is to have a database manager to connect to the database. The database manager used here is MySQL Turbo Manager.

3.5.4 Communications Interfaces


There is no involvement of POP, SMTP or other protocols. Hence communication interfaces are absent to the system.

Acceptance Test plan 14

Image Hunter

Test Id 1

Input Description Expected Output Admin uploads an On doing so, image

Actual Output the

browsing database should be for the feature set of the image. And the admin given upload should chance one be to more enters name the should the system

from the hard drive inserted with a tuple to the system.

image. Admin logs into the When user system alphanumeric password system using correct and system Else, user

password,

continue its execution. should halt there till the correct user name and 3 User password is entered searches On

successful the new

based on image by execution, from the hard drive.

browsing an image interface for attributes should appear on the screen. Here the user is permitted allowed to make between NDIR. 3.1 User chooses CBIR Then the ranked a choice and CBIR

result set will appear as per the content 3.2 User chooses NDIR based search. Then the ranked 15

Image Hunter

result set will appear as 4 per the near duplicate search. User chooses the In this case user attributes searching. for specifies the attributes like rank range and distance range. The result of searching will 5 follow the attributes. User chooses the As the algorithm can tuning factor. be tuned for the involvement of colour and structure, users are allowed to tune between the two. That means, between a these point two

values can be taken. 6 User sees This refines the result. the The searching should happen semantics conceptually on the screen. as and per the ranked

ranked result.

results should appear

3.6 Other Requirements:


Any other requirements apart from the ones stated above are not the part of the system requirement.

16

Image Hunter

3.7 Appendix A: Glossary


Abbreviations CBIR : Content Based Image Recognition NBIR : Near Duplicate Image Recognition

4. Software Design Document

17

Image Hunter

4.1Introduction
4.1.1 Summary:
4.1.1.1 Purpose of project The intended system shall be able to identify the Content related images in the database. Here the retrieved images should be content related to the query image. Also, the intended system shall be able to find out the Near duplicate set of images from a database with reference to a query image. 4.1.1.2 Scope of Project The system to be designed searches for the image based on semantics of the queried image and the results are ranked based on the conceptual distance with queried image. In case of the near duplicate search, the morphed images are identified. The results are then ranked with conceptual distance. This ranking can be alternative to the traditional page ranking of image (done on the number of visits). The advantage is that, the rank will never change until the content of image changes (Unlike, page rank where ranking changes in short intervals). 4.1.1.3 Intended audience This a project about a college level, the intended audience are as follows. i.Project Guide ii.Evaluation committee iii.Peers i.Project Guide can use this document for monitoring the progress in the development of the project and set guidelines according to it. ii.Evaluation Committee can use this document to map the requirements quoted and the detailed design document. Also check for the completeness and unambiguous requirements. iii.Peers can use this document this to suggest any changes needed and give reviews on the project.

4.1.2 Terminology
Some of the terminologies used in the document are Feature: Feature vector that describes the feature set of the image. 18

Image Hunter

Distance: This refers to the Canberra distance between two vectors (image vectors) Tuning factor: Refers to the in the distance formula for the CBIR distance based ranking.

4.1.3Design Goals and Non Goals


Goals: The variety of environmental data (text, diagrams, images, videos, and other types of measurements), together with the fact that they are stored in worldwide distributed databases, creates new challenges when attempting to retrieve relevant information. Furthermore, this information is usually attached to time and space. This project focuses on using both LOCAL and GLOBAL features of the image to retrieve relevant information. Also to detect piracy by its application

Non Goals: Provide interface to other applications which need service of the project. (i.e if the system is supposed to be a subsystem of larger subsystem, the interface is not provided in the current scope)

4.1.4 Common Scenarios


Two common scenarios in the project are Admin populates the database: Admin populates the database with feature vector by uploading an image User queries the database: User searches for the near duplicate or content related images in the database.

4.2 Architectural Design


4.2.1Logical View: Intended system is design in SINGLE TIER
ARCHITECTURE. 19

Image Hunter

Presentation logic: The processing (instructions, routines, etc.) required to display or print data. It typically refers to the execution of the user interface (GUI). Business logic: describe the functional algorithms that handle information exchange between a database and a user interface Data access logic: Which provides simplified access to data stored in persistent storage of some kind, such as an entity-relational database?

The block diagrams for both the intended users are as follows. 20

Image Hunter

4.2.1.1 ADMIN:

4.2.1.2 USER:

ABSTRACT SPECIFICATION: 21

Image Hunter

User: i) Query Image: Browse an image from Local Hard Drive. And query the image. ii) Extract Features: Feature set is extracted one at a time from database. Feature retrieved from DB is matched with features Canberra distance Sort the distance in ascending order and Display Result as per sorted distance. Compute colour at all 63 centroids. iii) Store in database: Store the features in individual column along with the path with where image is stored. of image uploaded using

4.3 Detailed Design


IMAGE HUNTER APPLICATION 22

Image Hunter

4.3.1 Class Diagram:


USER ADMIN User_operation() Admin_login() Admin_operation()

IMAGE Location, Width, Height, Features, CompMatrices LoadImage () ExtractFeatures () StoreInDB ()

FEATURES CentroidFeatures ColourFeatures StoreFeaturesInDB

SEARCH_OBJECT Distance Compute_distance_CBIR () Compute_distance_NDIR () To hold Image location and Rank a simple structure can be used as below: struct image_distance { Image; Distance; } 4.3.2 Use case diagrams: 23

Image Hunter

Uploads an image

Feature Extraction

DB

ADMIN

Use case: Admin uploads an image Success Scenario: The image gets stored in a location and the database is populated with the feature set of the image.

Uploads an image

Feature Extraction

DB

USER

Compute Distance and rank VIEW RESULTS

Results

Use case: User searches the database Success Scenario: The ordered set of results is displayed in the database. 24

Image Hunter

4.3.2 Sequence Diagram 1.Admin

25

Image Hunter

2.User

4.3.3 Data Flow Diagrams

26

Image Hunter

4.4 User Interface Design:

27

Image Hunter

Component Design: Feature of Extraction can be divided into two parts Centroid features: Hierarchical centroid. Here the hierarchy of centroids is computed first.

28

Image Hunter

Colour features: Colour at centroids. In this subsystem only the colour at given centroid is computed.

4.5 Database Design


4.5.1 ER Diagram:
This application concentrates on searching only one type of multimedia object (image). Hence the Entity Relationship diagram is not needed. Table Features Image Location Feature 1 Feature n

4.5.2 Algorithm Design:


4.5.2.1 Near Duplicate Image Recognition i. Compute the feature vector of Query Image. ii. Then compare the query image vector with the ones in the database. iii. If u is the query image and v is image in the database. Then find distance

iv. Find the distance with respect to all images in database. v. Sort all the distances in ascending order. The one with least distance is the most near duplicate image vi The results can be ranked on distance

4.5.2.2 i. ii.

Content Based Image Recognition

Feature vector also includes colour of all 63 points. Thus now feature vector is now containing 126 + 63 = 169 features. 29

Image Hunter

iii. iv. v.

Feature vector also includes colour of all 63 points. Thus now feature vector is now containing 126 + 63 = 169 features. Then extract features. Find the distance using

vi.

Set the value of , which is the tuning factor that decides upon whether a user prefers structure or colour

4.6 Logging
Not applicable to this project.

4.7 Exceptions
System should warn when the image type that is uploaded is not of relevant type. Warning should be thrown when the image uploaded is more than some threshold value.

4.8 Localization
This project can find its localization in Medical diagnosis. Forensic science. Search engines etc

This system can be built on existing computing architectures. 4.9 Dependencies The algorithm proposed for designing this system is Adaptive Geometric Centroid Algorithm which works on the concept of centroids of matrices. The algorithm assumes the following i. Centroids are invariant to scaling of image. ii. Centroids are also invariant to the illumination of the image. Since the system to be designed is an independent entity, there are no dependencies.

30

Image Hunter

4.10 Deployment diagram

The deployment of the system is done on the single machine on which both application and the database runs.

4.11 Design Decisions


Feature extraction is applied both at administrator and general user end. The other approach is to store image in the database and then while searching apply feature extraction on the stored images. The first approach is feasible as it takes less time in searching.

4.12 Open issues


The issue that needs to be resolved before finalizing the design is to have a mathematical proof of whether this centroid algorithm is rotation invariant (Its both illumination and scaling invariant)

31

Image Hunter

5. References
[1] http://www.agilemodeling.com/artifacts/deploymentDiagram.htm [2] Near-Duplicate Image Recognition and Content-based Image Retrieval using Adaptive Hierarchical Geometric Centroids Mai Yang, Guoping Qiu, Jiwu Huang and Dave Elliman, The 18th International Conference on Pattern Recognition 2006 [3]Image Retrieval: Ideas, Influences, and Trends of the New Age RITENDRA DATTA, DHIRAJ JOSHI, JIA LI, and JAMES Z. WANG [4]Image Retrieval using Shape Feature S.Arivazhagan, L.Ganesan, S.Selvanidhyananthan [5]Region Based Image Similarity Search - Prof. Dr. H.-J. Schek

32

Image Hunter

6. Appendix
Continuous work is going on with respect to Content Based Image Search and the near duplicate searching. One beta version of search engine that operates on the visual query is available at http://www.tiltomo.com/ Google is also working on CBIR and its first iterate (prototype) is available at http://similar-images.googlelabs.com/

33

You might also like