You are on page 1of 3

Efficient Graph Similarity Search Over Large Graph Databases

Abstract:
Since many graph data are often noisy and incomplete in real applications,
it has become increasingly important to retrieve graphs g in the graph
database D that approximately match the query graph q, rather than exact
graph matching. In this paper, we study the problem of graph similarity
search, which retrieves graphs that are similar to a given query graph
under the constraint of graph edit distance. We propose a systematic
method for edit-distance based similarity search problem. Specifically, we
derive two lower bounds, i.e., partition-based and branch-based bounds,
from different perspectives. More importantly, a hybrid lower bound
incorporating both ideas of the two lower bounds is proposed, which is
theoretically proved to have higher (at least not lower) pruning power than
using the two lower bounds together. We also present a uniform index
structure, namely u-tree, to facilitate effective pruning and efficient query
processing. Extensive experiments confirm that our proposed approach
outperforms the existing approaches significantly, in terms of both the
pruning power and query response time.

Existing System:

Some real-life graphs, such as protein-protein interaction networks [13],


often contain noises. It is desirable to find a robust solution to retrieve
graphs that are of interest to users even in the presence of noises and
errors. An interesting topic is to study graph similarity search, which
retrieves all graphs g from a database D that approximately match with q
under some similarity measure.
Utilize the difference of the vertex/edge number as the lower bound. The
second one considers the difference of vertex/edge labels to further
improve the pruning power. Since these methods do not employ the graph
structure, the lower bounds are not tight enough.
The other category of filters adopt the n-gram method, which is used in
string similarity search problem. The basic idea of these methods is to
select some subgraph structures as the n-grams, and then derive the lower
bound based on the common n-grams of the two graphs.
Proposed System:
We propose a different n-gram, namely branch, which is defined as a
structure consisting of one vertex and its adjacent edges without including
the other endpoints.
The superiority of branch lies in that a single edit operation (addition,
deletion, or substitution) can affect only two branches at most. Although a
branch is structural similar to c-star except for excluding the leaf nodes of a
c-star, one edit operation can affect MAX(_(q); _(g)) c-stars.
If a query graph or a data graph has some high degree vertices, the lower
bound in c-star is very loose due to the large penalty ratio in c-star lower
bound equation.

Hardware Requirements:

System

: Pentium IV 2.4 GHz.

Hard Disk

: 40 GB.

Floppy Drive : 1.44 Mb.

Monitor

: 15 VGA Colour.

Mouse

: Logitech.

RAM

: 256 Mb.

Software Requirements:

Operating system

: - Windows XP.

Front End

: - JSP

Back End

: - SQL Server

Software Requirements:

Operating system

: - Windows XP.

Front End

: - .Net

Back End

: - SQL Server

You might also like