Professional Documents
Culture Documents
Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06)
0-7695-2641-1/06 $20.00 © 2006
Authorized licensed use limited to: Maharashtra Institute of Technology. Downloaded on August 16,2010 at 11:49:18 UTC from IEEE Xplore. Restrictions apply.
2 Related Work 3 Problem Statement
Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06)
0-7695-2641-1/06 $20.00 © 2006
Authorized licensed use limited to: Maharashtra Institute of Technology. Downloaded on August 16,2010 at 11:49:18 UTC from IEEE Xplore. Restrictions apply.
3.3 Peer-to-Peer Join With the join graph, a peer can identify the peers with
which it can be joined, and try to perform a P2P join. For the
Definition 1 P2P-Join is a join operation that combines join results that contain all keywords in the query can now
two (or more) relations from two (or more) peers based on be directly returned to the requesting peer, while the results
the semantics of keywords and syntax of join operation of containing partial keywords will be propagated along the
relational database systems. join paths until the final results including all keywords in
the query are generated and then returned to the user.
For example, consider a query Q={k1 , k2 }. Suppose
there are two peers P1 and P2, such that peer P1 maintains 4.2 The Generation of Join Graph
a Relation R1(A1,. . ., B, . . .) and peer P2 also maintains
a relation R2(A2, . . ., B, . . .), and both relations share a A join graph is a graph G (V, E), in which V is the vertex
common attribute B. Furthermore, if some of the values of set, and E is the edge set. In such a graph, each vertex de-
attribute A1 is k1 while some values of attribute A2 is k2 . notes a peer with one keyword in the query Q. According to
In addition, if there exist a tuple < k1 ,. . ., b,. . ., x > in R1 this strategy, one peer may be denoted by several different
and a tuple < k2 ,. . .,b, . . ., y > then a P2P join operation vertices if it contains several keywords in the query. Two
can be performed and the result is < k1 , b, k2 ,. . ., x , y >. peers are connected by an edge if and only if they contain
From the above example, we present a theorem for P2P- two different keywords in the query and they share certain
Join, which can be easily proved. common attributes with the same meaning. A peer’s ver-
tices may be connected if it contains more than one keyword
Theorem 1 Two tuples in different peers are joined by a in the query and they can be joined locally.
operation of P2P join if and only if their common attribute After processing the query locally, each peer sends the
of two relation pertaining to two different peers has at least keywords in the query that it contains to all its neighbors
one equal value, and furthermore, such a pair tuples con- who are also accessed by the query with the same query
tain different keywords in the query Q. identity. Further, when a peer receives the keywords from
one of its neighbor, it compares them with those in the query
4 Framework of P2P-Join that appears in its own local database. If there exists one
or more keywords in local database that are different with
4.1 An Overview those are in the neighbor candidate, it will establish a con-
nection with that peer. With these operations, the join graph
In general, a query processing consists of six steps: to a query is thus generated.
Query distribution, Local processing, Information ex- From the above, we can see that only peers connected by
change among peers, Join graph generation, P2P join, and an edge in the join graph need to be joined.
Result propagation.
When a query is submitted, it is distributed to all neigh- 4.3 Peer-to-Peer Join
bors. The neighbors will further forward the query to their
own neighbors, and so on, till the query’s lifetime (Time-to- First, we consider the problem of P2P join between two
Live, i.e., TTL) is expired. relations from two different peers. For example, Relation
For each peer who receives the query, it will first lookup R1(A1,...B,...) and Relation R2(A2,...B,...) are two rela-
the full text index of its database to decide whether it con- tional tables belonging to two different peers P1 and P2.
tains some or all of the keywords in the query. Based on According to the definition of P2P join, obviously, the tu-
keyword searching methods in databases, tuples that con- ples whose values of attribute A1 or A2 are not k1 or k2 can
tain all keywords will be returned to the requesting peer di- be filtered out directly, while other tuples are reserved. Cer-
rectly, while tuples including partial keywords will be used tainly, we would like the filtering operation to be performed
for peer-to-peer join. At present, there are many approaches locally first, since this will reduce the bandwidth cost, and
to support local keyword-based query processing, such as distribute the processing load on different peers, thus im-
DISCOVER[9], BANKS[2], DBXplorer[1]. proving query processing performance to some extent.
Based on the results of local processing, each peer ex- We can further improve performance by some optimiza-
changes information with its neighbors, which are key- tion techniques employed in RDBMS, such as semi-join.
words in the tuples that only include subset of keywords in
the query. After having the information of its neighbors, a 4.4 The propagation of Results
peer will generate a join graph, where the vertices are peers,
while edges connect the pairs of peers that should be joined. A join path is a path in the join graph, in which each
how a join graph is generated is describe in the next section. keyword in the query appears exactly once in the vertices. A
Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06)
0-7695-2641-1/06 $20.00 © 2006
Authorized licensed use limited to: Maharashtra Institute of Technology. Downloaded on August 16,2010 at 11:49:18 UTC from IEEE Xplore. Restrictions apply.
final result can be only obtained after all join operations on
the edges in a join path have been executed. Furthermore, 1 4
when all join paths have been traversed, we suppose, the
complete answer set to a given query is obtained. Therefore, 3
to obtain the results of a query, each join path should be
traversed, which is described by the following procedure.
BEGIN 2 5
For each join path
For each edge connects Peer p and q
P2P-Join p and q Figure 1. An example of Join Graph
Reset neighbors and keywords
Delete edge(p,q)
Once a path traversed,
send results to requesting peer path will have different cost, thus providing some opportu-
END nities for further optimization.
In the context of database, it is widely accepted that shar-
ing common computation can be always beneficial. For ex-
Note that the neighborhood relationship is changing with ample, if we need to process both (A B) and (A B
the join process going on. Furthermore, the process will C), we can process (A B)first and store its result for
traverse all the join paths and the complete answer set to the later use (processing (A B C)). Here, we will use this
given query is obtained when it is finished. Compared to heuristic to direct the join sequence choosing.
the traditional processing, the above traversal procedure in Using the above join graph as an example, we are now
a P2P system is fully distributed. illustrating how the heuristic can be used to help reducing
The above algorithm is similar to the problem of graph computation cost. We consider how different order of oper-
traversing, which has been proved to be cost-exponential. ation of edges affects the reuse of edge 3 in the join graph,
However, with fully distributed approach, the situation is which imply the efficiency of the utilization of the resources
different. Here, we only perform the worse case time anal- in relational database systems.
ysis. Suppose in a P2P system, n peers have been accessed
by the query, and the maximally expected neighbor number 1. If the join operation on edge 3 is executed first, the
of a peer be k. Obviously, k n. Then the internal loop is partial result can be taken advantage of by further join
executed at most O(k)m!. Here, m is the number of the key- operations (1,3), (1,3,4), (1,3,5), (2,3), (2,3,4) (2,3,5),
words in the query. Therefore, the total execution time takes so that the sum of reuse time of edge 3 is 6.
in the worst case is O(knm!). Since k and m is much smaller
number, the time complexity of the algorithm is much less 2. However, if join operation on edge 1 (or 2) is executed
than n2 . first, the partial result of edge 3 can be only taken ad-
vantage by (1,3,4) and (1,3,5) (or, (2,3,4) and (2,3,5)).
5 Improvements Thus, the sum of reuse times of edge 3 is only 2.
Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06)
0-7695-2641-1/06 $20.00 © 2006
Authorized licensed use limited to: Maharashtra Institute of Technology. Downloaded on August 16,2010 at 11:49:18 UTC from IEEE Xplore. Restrictions apply.
5.2 Push-based Balancing References
[1] S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A sys-
After several iterations of the join operations, the peers
tem for keyword-based search over relational databases. In
which have many neighbors can be very busy, which has Proceedings of the 18th ICDE, CA, April 2002.
been observed in our experiments. To achieve better load
[2] G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, and S. Su-
balancing, we propose another heuristic: Each peer can set darshan. Keyword searching and browsing in databases us-
a threshold based on its processing power to denote how ing banks. In Proceedings of the 18th ICDE, CA, April 2002.
many join operations it can process simultaneously. If the [3] P. Druschel. and A. Rowstron. Past: Persistent and anony-
number of join processing a peer has exceeded the thresh- mous storage in a peer-to-peer networking environment. In
old, it can broadcast to its neighbors to let other peers exe- Proceedings of the 8th IEEE Workshop on HotOS, 2001.
cute the join operation instead. [4] Gnutella Homepage. http://gnutella.wego.com/.
Two cases of this heuristic exist: If one of its neighbors [5] S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu.
takes the join task, only the join plan, which will put the What can databases do for peer-to-peer. In WebDB, 2001.
result to the other endpoint of the connection will be con- [6] Groove Home Page. http://www.groove.net.
sidered. Otherwise, the peers that should be joined will
[7] A. Y. Halevy, Z. G. Ives, and D. Suciu. Schema mediation in
send their data to a third peer, which will take over the join peer data management systems. In Proceedings of the 19th
operation. Note that, there currently exist many join algo- ICDE, 2003.
rithms, such as pipeline join or RIPPLE join, can execute [8] M. Harren, J. Hellerstein, R. Huebsch, B. Loo, S. Shenker,
such tasks. and I. Stoica. Complex queries in dht-based peer-to-peer net-
works. In IPTPS02, 2002.
5.3 Implementation [9] V. Hristidis and Y. Papakonstantinou. Discover: Keyword
search in relational databases. In VLDB’2002, 2002.
[10] P. Kalnis, B. C. Ooi, W. S. Ng, D. Papadias, and K. L. Tan.
A prototype with P2P-Join operation has been built upon An adaptive peer-to-peer network for distributed caching of
BestPeer [14], a generic P2P platform on which P2P ap- olap results. In ACM SIGMOD, 2002.
plications can be developed efficiently. BestPeer integrates
[11] T. Katchaounov. Query processing in self-profiling compos-
mobile agent and P2P techniques together. While P2P pro- able peer-to-peer mediator databases. In Proc. EDBT Ph.D.
vides resource sharing amongst nodes, mobile agents ex- Workshop 2002, 2002.
tends functions, including P2P-Join operation. In addition, [12] A. Kementsietsidis, M. Arenas, and R. Miller. Data mapping
peers in BestPeer can dynamically reconfigure their neigh- in peer-to-peer systems. In Proceedings of the 19th ICDE,
bor candidates. Further, Chord [20] is employed to map 2003 (Poster Paper).
meta-data (e.g., key, foreign key) among peers. [13] MSN Home Page. http://www.msn.com/.
An experimental study has been conducted upon the pro- [14] W. S. Ng, B. C. Ooi, and K. L. Tan. Bestpeer: A self-
totype, and the primary results are promising. Furthermore, configurable peer-to-peer system. In Proceedings of the 18th
with the two heuristics being implemented, the performance ICDE, San Jose, CA, April 2002 (Poster Paper).
is greatly improved compared with original proposal. [15] W. S. Ng, B. C. Ooi, K. L. Tan, and A. Zhou. Peerdb: A p2p-
based system for distributed data sharing. In Proceedings of
the 19th ICDE, 2003.
6 Conclusion [16] A. B. Philip, G. Fausto, K. Anastasios, M. John, S. Luciano,
and Z. Ilya. Data management for peer-to-peer computing:
This paper managed to deploy relational database opera- A vision. In WebDB Workshop, 2002.
tion upon P2P computing. First, the concept of P2P-Join is [17] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and
proposed, which can combine tuples among relations from S. Shenker. A scalable content-addressable network. In Pro-
different peers containing certain keywords in the query. ceedings of SIGCOMM, 2001.
Further, a fully distributed method to realize P2P-Join pro- [18] A. Rowstron and P. Druschel. Pastry: Scalable, distributed
cessing is devised, which inherits the syntax and semantics object location and routing for large-scale peer-to-peer sys-
of traditional join and cherishes the ideology of P2P as well. tems. In Proceedings of the International Conference on
Distributed Systems Platforms (Middleware), Germany, Nov.
Finally, two enhancements are proposed to improve the per-
2001.
formance of the proposed P2P-Join operation. Since rela-
tional database-enabled operation in P2P computing is still [19] Seti@home Home Page. http://setiathome.ssl.berkely.edu/.
at its infant stage, some other issues need to be addressed, [20] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakr-
e.g. network optimization and cache management, which ishnan. Chord: A scalable peer-to-peer lookup service for
internet applications. In Proceedings of SIGCOMM, 2001.
are the topics of our future research.
Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06)
0-7695-2641-1/06 $20.00 © 2006
Authorized licensed use limited to: Maharashtra Institute of Technology. Downloaded on August 16,2010 at 11:49:18 UTC from IEEE Xplore. Restrictions apply.