Sole: Scalable On-Line Execution of Continuous Queries On Spatio-Temporal Data Streams

SOLE: SCALABLE ON-LINE EXECUTION OF CONTINUOUS QUERIES ON SPATIO-TEMPORAL DATA STREAMS Mohamed F. Mokbel Walid G.
Aref
CSD TR #05-016 July 2005
vldb manuscript No.

(will be inserted bv the ctlitor)
Mohamed F. Mokbel
. Walid G. Aref
SOLE: Scalable On-Line Execution of Continuous Queries on Spatio-temporal Data Streams
the date of receipt and accrptancc sl~ouldbe inserted later
Abstract This paper pi,csents t , l ~ eScalable On-Line 1 Introduction E;cecvtion algorith~ll (SOLE. for short) for co~ltinuous
ar~d on-line e~ialuationof concurrei~tcontint~ousspatiospatiotemporal clueries over data strcallls. Inco~lli~lg teinporal data streams a.re processed in-memory against contil.~uot~s queries. The SOLE algoa set of outsta~ldi~lg rithm utilizes the scarce ineinory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e.. dropped) from memory once they beco~ne,ins-ign(ficant.SOLE is a scalable algoot~tstanding clueries share rithm where all the conti~luous the same buffer pool. 111 addit,ion: SOLE is preseilted as a spatio-temporal join bet.~veen two input streams, a stream of spatio-temporal ol~jects and a stream of spatiot,emporal queries. To cope \vit,h intei.\~als high arrival of rates of objects and/or queries, SOLE utilizes a selftnning approach based on load-shedding where some of the stored objects are dropped fro111 memory. SOLE is i~npleinei~ted a pipelined query operator that can be as co~nbiiledwith traditional query operators in a query exect~tionplan to support a witle variety of contilluous queries. Performance e x p e ~ i ~ n e nbased on a real iinplets mentation of SOLE inside a prototype of a data stream management system sIlo\v tlle scalability and efficiency of SOLE in higllly tly~lanlic environn~eilts. The rapid increase of spatio-temporal applicat,ions calls for ne\\- query processi~lgtechniques lo deal 1vit.11 the coiitii~t~ot~s ai.ri\!al of spatio-temporal data streams. Exa~nples spatio-temporal applicatio~ls of include locationaware services [28]: traffic monitoring [34]. and ei~lianced 911 services [19]. Recent research elforts for contii~t~ous sl~atio-temporalcluery processing (e.g., see [20-25,33,29, 36:39:41.32:44]) suffer from one or ]nore of the follo~vi~lg dra~vbacks:(1) They rely ~nainly011 t11e ability of storing a i d i~ldexii~g spatio-temporal data. Such i~ldexing schemes fail in practice to cope with high arrival rates of spatio-ternpoi.al data streams \\lhere only in-memorj- alqueries call be realized. (2) Most gorithms for conti~luous of the l~roposedtecliniclt~esare based on lligl~level impleineiitations 011 top of the database engiile. S11ch high level i ~ n p l c i n e i ~ t a t ,does~ not scale 11p with the large io~ niunber of inoving objects in highly dynanlic environmei~ts. hlost of the algorithms are tailored to support (3) o111~. one qllery type. From a system point of view, it is illore at.t~,active ha\;e one unifier] fraine\\:orl< that can to support ~ ~ a r i o u s cjuery types. On the other side, 1,esearch effort,s in dala strean1 n ~ a n a g e ~ ~ l esyst.eins (e.g.. see [2,G,10.11.321) focus nt mainly on pi,ocessi~lg continuous queries over data streams. lIo\vevei.. the spatial and tenlpol-al p1,operties of both data s t r e a ~ l ~ s continuous queries are overand looketl. C o ~ ~ t i n t ~ query processi~ig spatio-te~nporal ous in strea~ns lias the follo\\:i~ig distinguishing cllal-acteristics: (1) Queries as \yell as data have t,he ability to continuotlsly change their locations. Due t o this mobility: an). delay in pi.ocessing spatio-tempoi.al queries may r e s ~ ~ l t in an ol~soletealis\\.ei.. (2) An object i n q - I)e added to 01. reinovecl fl-oin the ails\ver set of a spat,io-t.empora1 c1uei.y. Coiisider m o ~ i n g \:chicles that move in and out query region. (3) The comn~o~lly n~oclel used of a. ce~,tain of slicl'ing-~lindou: queries [3.4.15] does not support con]111011 spatio-temporal cl~leries l ~ a t t a.re interested on the c~u.re~lt state of the d;ataba.se rather than 011 tile ~ t c e n t
This work \\)as support,ed in part b\; the Nat,ional Science Founda.t.ionu ~ ~ d Gra.nt.s11s-0093116.IIS-0209120, and er 0010044-CCR. hlohan~edF. hlokbel Departn~e~it Comput.er Science and Engineering, Univerof sit,?; of A1innesot.a. h,Iinneapolis. hIN, 55455 E-mail: mokbelQcs.urnn.edu \Valid G . Aref Department of Computer Science. Purdue University, \Vest Lafayette, IN 47907 E-mail: aref@cs.purdue.edu
l~istorical state. The current state of a database is a con- 2 Related Work bination of recently received data and old data that has not been updated recently. Up t o the a.ut11ors' kno\~letlge. SOLE provides t,he first. I11 t.his paper. me propose the Scalrtble On,-Line a,ttempt t o furnish query processors in data strea.111manE x e c u t i o n algorithm (SOLE, for short) for contin~~ousagement systenls with the required operators and aland on-line e\laluation of concurrent continuo~~s spatio- gorithms to support a scalable execution of concurrent temporal clueries over spatio-temporal data streams. continuous spatio-temporal queries over spatio-temporal SOLE combines the recent advances of both spatio- data streams. Since SOLE bridges the areas of spatioquery processors and data stream te~nporaldatabases and data s t r e a ~ n nlanagen~entsystemporal continuo~~s management systeins. On-line execution is acl~ievedin tems: in this section we discuss the related xvork in each SOLE by allowing only in-memory processing of incom- area sepa,rately. ing sl~atio-temporaldata streams. The scarce memory resource is efficiently utilized by keeping track of only those objects that are considered significant. Scalabil- 2.1 Spatio- temporal Databases it,)! in SOLE is achieved by using a shared 1~1ffer pool that is accessible by all outstanding queries. Flu.t.hei-- Existing algorithms for contint~otls spntio-temporal more. SOLE is presented as a spatio-temporal join be- query processing foctls mainlj- on ~naterializingincomt\\reen t , \ \ - ~n l ~ u tst,reams:, a stream of spatio-t,emporal ing spatio-temporal data in disk-ba,sed indexing struci objects and a strea.m of spatio-temporal queries. To cope tures (e.g.: hash ta.bles [9.40]. grid files [14,29,35],the with inter\rals of very high arri~ral rates of objects and/or R-tree 1221. t,he R-tree 123.251. and the TPR-tree [39. queries. SOLE adopts a s e l f - t u n m g approach based on 421). Scalable execution of conc~~rrent spatio-temporal load-shedding. The main idea is to dynamically adopt the qt~eries addressed recentl>-for centralized [14,36] and is notioil of signjficant objects based on the c ~ ~ r r e n t load. distribut,ed environ~nent,~ 18.141. Ho\ve\;er. t,he under, Thus, in-~nemorystored objects that beco~neinsign,?f- lying data structure is either a disk-based gird strucicani \\-it11 respect to the new notion may be dropped ture [14.29] or a disk-based R-tree 18.361. None of these from memory. In addition, newly incoming objects are techniques -deal with the issue of spatio-temporal data admitted to the system only if they are considered sig- streams Issues of high a r r i ~ a lrates. infinite nature of nificant. The main goal of self-tuning in SOLE is t o s11p- data. and spatio-temporal stlearns are o\~erlookedby port larger i~unlbersof continuous queries, yet \\:it11 ail tl~ese approaches. With the noti011of data streams. only approximate ans\\:er. in-memory algoritl~ms and data structures can be realTwo alternati\-e approaches exist for iinplen~entilig ized. spatio-temporal algorithnls in database systems: using The most related work to SOLE in the coiltext of table junctions or encapsulating the algoritlin~into a spatio-temporal dat,abases is the SINA fi-amework [29]. pliysica,l pipelined operator. In the first approach. \vllich is SOLE has commo~l f~unctionalities wit11 SINA where both employed by exist,iilg spatio-temporal algorithms. algo- of then1 utilize a shared grid structure to produce in[37]. cremental results in the form of posit,iue and negative rithms are inlplemented using SQL table f~ulctio~ls Since there is no straightforward method of pushing updates. Ho\ve~er: SOLE disting~~ishes itself from SINA [38]. t.lle pel-for- and other spatio-t,empoi.al query processors in the folquery predica,tes into table f ~ ~ n c t i o n s inance of this t,able function is severely limited and the lo\?;ing aspect,^: (1) SOLE is an ill-memory algorithm approach does not give enough flexibility in opt.i~i~iziiigwhere all data st.]-ucturesare memory based. (2) SOLE the issued queries. The second approach, which we adopt is equipped with load sll.edding lechniques to cope with in SOLE: is to define a query operator that can be part, intervals of high arrival rates of ~novingobjects and/or of the query execution plan. T h e SOLE operator can be queries. (3) As a result of t.11e streaming environment; combined with traditional operators (e.g., join. aggre- SOLE deals ~ r i t h new cl~allenging issues?e.g., ~ulcertainty gates. and distinct) to support a wide Yaiiety of spatio- in query areas: sca.rce memory resources, and approxitemporal quelies. In addition. with the SOLE operato~ mate cluery processing. (4) SOLE is encapsulated into the query optiinizer can support ~nultiplecandidale ex- a physical non-blocking pipelined query operator where ecut,ion plans t,he result of SOLE is produced one tuple at a time. PreThe rest of t,l~is paper is organized as follo\\:s: Sec- \.iotls spatio-temporal query processors (e.g., SINA) call \\~here result the tion 2 highlights relat,ed work. The basic concepts of be implemented only as a table f~mction is SOLE are discussed in Section 3. The SOLE a l g o r i t l ~ ~ ~ i produced periodically in batches. is presented in Section 5. Approxiinate cluer! processing in SOLE via load shedding a,nd seU t.uning is presented in Section 6. Experimental results that are based 011 a real 2.2 Data Streain hlanagenlent Systems implementation of SOLE inside a data stream manage~neilt syst'em are presented in Sectioil 7. Finall!.. Section 8 Existing prototypes for data stream management sysconclt~des paper. the teins 11.10.12.321 aim to efficiently support continuot~s
>
queries over data streams. Ho\vever. the spatial and teinporal properties of data streams and/or continuo~~s queries are overlooked by these protot,ypes. R;ith lim- I n p u t . The input to SOLE is two streams: (1) A stream ited memory resources. existing stream query processors of spatio-tenlporal data that is sent fi-0111 continuously adopt the concept of slldlng \vindo\~sto limit the num- inoving objects with the forinat ( O I D . Loc. T). where ber of tuples stored in-memory to only the recent tu- O I D is the object identifier. and Loc is the cllrrent locaples [3.4.15]. Such nlodel is not appropriate for many tion of the moving object a t tinie 7' hIo\.ing objects are spatio-temporal applications where the focus is on the required t o send updates of their locations periodicallj. current status of the database rat'her than on the recent, Faill~i-e do so results in considering the moving obto queries over spatio- ject as disconnected. (2) A streain of conti~iuous past. T h e 0111,~ work for c o n t i n ~ ~ o u s quel-ies. teinporal streains is the GPAC algorithm [2i]. However. Queries can be sent either froin moving objects 01- from GPAC is concerned only with the execution of a single external entities (e.g., a traffic adininistrato~.). genI11 query. In a typical data stream eral. a query Q is represented as ( Q I D . R e g i o n ) . \\?l~ei-e outstanding c o n t i n ~ ~ o u s environment. there is a huge number of outstanding con- Q I D is t,he query identifier. and R e g i o n is the spatial tinuous queries 111 \\-hich GPAC cannol afford. area covered by Q Scalable execution of continuous queries in tradi- O u t p u t . SOLE employs an incremental evaluation tional data streams aim t o either detect common subex- paradigm similar to the one used ill SIKA [29]. T h e of pressions [11.12.26] or share resources a t the operator inain idea is to avoid continuous reeval~~ation conti11~1the level [3.13.161. SOLE exploits both paradigms where 011s spatio-temporal queries. Instead. SOLE ~~pclates evaltlating m~lltiple spat io-t einporal queries is performed query result by computing and sending only updates of as a spatio-temporal join bet\veen an object streain and the pre\liously reported answer. SOLE distinguish betypes of query updates: Positive l~pn'olesand a query stream \+-11ilea shared meinory resource (buffer tween t \ \ : ~ pool) is inaintained to support all continuous queries. negative .updates. A positive update iiidicates that a cerLoad shedding in data streain manageinent systems is tain object needs t o be added t o the result set of a ceraddressed recently in [5.43]. The inail1 idea t o add a tain c1uei.y. 111contrast, a negative update indicates that. special operator to the query plan t o regulate the load a certain object is no longer in t,he answer set of a cerby discarding ini important incoming tuples. Load shed- tain query. Thus, the output of SOLE is a streain of O ding techniques in SOLE are distinguished from other tuples urith the f o r ~ n a t( Q I D . f: I D ) . ~vllel-eQ I D is approaches \+.here in addition to discarding some of the the query identifier t h a t would receive this output tuple, ii~coming tuples, SOLE voluntary drops some of the tu- z t indicates whether this output is a positi,ue or negative updates. A positive/negative update indicates the addiples stoi-ed in-memory. tion/removal of object O I D t,o/fronl cluery Q I D. T h e most related 11-ork to SOLE in the context of data
stream management s!.st,eins is t,l~e NiagaraCQ framework [12]. SOLE has coininon functionalities urith NiagaraCQ where both of them utilize a shared operator to join a set of objects x i t h a set of queries. However. SOLE distingt~ishesitself from NiagaraCQ and other s\.stems in the following: (1) As d a t a streain ~nailageillent a result of the spatio-teinporal en\~ironment.SOLE has t o deal with new cl~allei~ging issues: e.g., moving queries: uncertainty in query areas, positive and negative updates t o the query r e s ~ ~ l(2) In a highly overloaded syst. tem: SOLE provides approximate results by employing load sh,edding techniql~es.(3) In addition t o sharing the query operator as in Nia.garaCQ, SOLE share memory resources a t the opera,tor level.
3.2 SOLE as a Pipelined Operatoi SOLE is encapsulated into a physical pipelined operator that can interact with traditional query operators in a large pipelined query plan. Havil~g SOLE opei.atol- eithe ther in the bottom or in the middle of the query pipeline requires that all the above operators be equipped with special mechanisnls to handle negative tuples. Fortunately: recent data stream inanageineilt systems (e.g., Borealis [I]: NILE [18]: STREAhI [32]) ha,ve the ability t o process such negative tuples Basically. negative tuples are processed in traditional opei.ators as follows: Selection and Join. operators handle negative tuples in the sa.me \\:a?. as positice tuples. T h e only difference is that the output will be in the form of a negative tuple. Aggregates update their aggregate functions by considering the received negall.ue tuple. The Distinct operator reports a negntive tuple at the ontput, only if the corresponding positive tuple is in the recently reported result. For detailed algorithms about haildliilg the negative tuples in various traditional cluery operators. the i.eadei is referred t o [17].
3 Basic Concepts i n S O L E
I11 this section. we discuss the basic concepts of SOLE that i n c l ~ ~ dThe i n p u t / o ~ ~ t model. supporting varie: p~~t ous queries: SOLE pipelined operat,or. and the SQL syntax.
3.3 Supporting Various Query Types SOLE is a unified frame\vork that deals \\-it11 range queries as \\!ell as k-nearest-neighbor (kNN) queries. In addition SOLE supports both stat,ionary and moving queries with the same fi-ame\\:o~.k. Moving Queries. Each moving query is bounded to object A1 sublnits a jocal object. For example, if a mo\~ing a query Q that asks about objects within a certain range of A J . then A.I is considered t,he jocal object of Q. A 1110~ing query Q is represented as ( Q I D : FocalID. Region): \\711ere Q I D is the query identifier, F o c a l I D is the object identifier that submits Q I and Region is the spatial area of Q. k N N Queries. A kNN cjuery is represented as a circular range query. T h e only difference is that the size of the query range mag grow or s11rink based 011 the nlovenlent of the query and objects of interest. Initially. a kNN query is sub~nitt,ed SOLE with t,l~e to format ( Q I D . center. k) 01- ( Q I D . Foca.llD. k) for stationary and moving queries, respecti\lely. Tllus, the center of the query circular region is either sta.t.ed explicitl>. as in stationary queries or inlplicitly as the current location of the object F o c a l I D in case of moving clueries. Once the kNN query is registered in SOLE, the first incon~ingk. objects are considered as the initial query answer. The radius of the circular region is determined by the distance from the query center to the current kt,h farthest neigh bor. Once the kNN query determines its init,ial circular region, the query execution continues as a regl~lar range query. yet with a variable size. \I:henever a newly con~ing object P lies inside t,he circular query region, P removes the kth farthest neighbor fro111 the ails\\-er set (\\-it11 a negative update) a.nd adds itself to the answer set (with a positive update). The query circlilar region is sh.runk t o reflect the new kt11 neighbor. Similarly. if an object P , that is one of the k neighbors: updates its location to be outside the circular region. we expand the query circular region to reflect the fact that P is conside~.ed the farthest kth neighbor. Notice that in case of expanding the query region, we do not output a.ny updates.
Static range query ( x l : yl . e 2 : y2): where (sl yl) and : (x2: represent the top left and bottoln right cory2) ners of the rectangular range query. - h.lo~-ing rectangular range query ('A.1': I D : xdist. ydist)t where 'AT' is a flag indicates that the query is moving: I D is the identifier of the query ,foca,lpoint. xdist is the length of the query I-ectangle: and ydist is the width of the query rectangle.
-
Similarly, the knn-clause may 11a.ve one of two foi-111s:

-
Static X-NN query (k.sL.: y): where k is the ilu~nber of the neighbors to be maintained: and (s! is the y) center of the query point. hloving kNN query ('A,ill: k . I D ) ; where 'Ad' is a flag indica.tes that the query is nloving, k is the number of neighbors to be maintained: and I D is the identifier of the query jocal point
4 Single Execution of Continuous Queries in
SOLE
To clarify many of the ideas used in SOLE, in this section: we present the SOLE in the context of single query execution 1271. In the next section. \\re sho\\r how SOLE can be generalized to t,he ca.se of evaluating n~ultiple concurrent continuous spatio-temporal queries. Assuming that for a query Q: the query answer is stored in Q.Answer, then: whenever SOLE receives a data input of object P: SOLE distinguishes among four cases:
-
Case I: P E Q.An.swer and P satisfies Q (e.g., Q l in Figure l a ) . As SOLE processes only the 11pdates of the previously reported result, P will neither be processed nor will be sent to the user. Case 11: P E Q.Ansuier and P does not sa.tisfy Q (Figure l b ) . In this case: SOLE reports a negati.ue update P'- to the user. Case 111: P @ Q.Ai.su!er a.nd P satisfies Q (Figure l c ) . I11 this case: SOLE reports a positive update t o the user. Case IV: P @ Q.Answer and P does not sa.tisfy Q (e.g., Q2 in Figure l a ) . 1 1 this case: P has no effect 1 0 1 Q . Thus, P will neither be processed nor \\.ill be 1 sent t o the tiser.
3.4 SQL Syntax Since SOLE is inlplenlented as a query operator. we use the following SQL syntax that in\.oke the processing of SOLE SELECT select,-clause F O jrom-clnuse RM W E E where-cla.use HR INSIDE in-clause k N knn-clm~se N The ~n-clause ]nay have one of two forins:
011 the other side, whene\rer SOLE receives an upc1a.t.efrom a ~ n o \ ~ i cluery, it classifies ii~-1ne111ory ~lg stored objects into four ca.tegories C1 t o Cq where: (1) C1 c Q.An.su!er and C 1 satisfies the new Q.Region (e.g., white objects in Figure Id). SOLE does not process any of the objects in C1. (2) C2 c Q.Ansu:er. & does not sat(e.g., gray objects in Figure Id). isfy the query ~,egion For each data object in C2: SOLE produces a negatl:ve updat.e. (3) C3 @ Q.A~ssu!er and C3 satisfies the new Q.Region. (e.g., black objects in Figure Id). For each data object in Cg, SOLE produces a positive upda.te. ( 4 ) CI, @ Q.Answer and C4 does not. satisfy Q.Region. SOLE does not process objects in C4.
(a) Nothing
m) Negative
( c ) Positive
(d) Moving Query
Fig. 1 Positlvr/Negali>e updates in SOLE

la1 Snapshot
at t i m e
T,,
Ibl Snapshot at t i m e TI
(c) Snapshot at t i m e T ,
P , . P , .wemoved
4.1 Uncertainty in Spat,io-tempoi.al Queries
9 , .P, are m o v e d .
Fig. 2 U~~cert.ai~~ty in spatio-temporal queries.
A key feature of SOLE is to utilize the scarce inein0i-y resource efficiently by keeping track of only those objects that satisfy a t least one outstanding cluery. However, a straightforward application of this feature may result in uncertainty areas. The .uncertainty area of a cluery Q is defined as fol10\\rs:
Definition 1 The ~mcertaintyarea of query Q is the spatial area of Q that mav contain potential moving ob~ects that satisfy Q. with Q not being a\vare of the contents of this area.
Figure 2 gives three consecutive snapshots of seven objects Pl to Pi. a moving range queries Q1. and a knearest-neighbor query ( k = 2) Q 2 . Two types of uncertainties are distinguished:
the uncertainty area of a continuous query Q and cnche in-111en10ry all moving objects that lie in Q's ~u~certaiiity area. \\:hei~e\~er uncertainty area is produced, Ive probe an the in-memory cache and produce the result iininediately. A conservative approach for caching is t o expand the query region in all directions with the maximtun possible distance that a moving object can travel bet~veenally t\vo consecnti\le updates. Such conservative approacl~ coniplet,ely avoids uncertainty areas \\.here it is guaranteed that all objects in the uncerta.int!~area a.re stored in the cnche. Figure 3 gil-es an example of using caching t o avoid ui~ce~taint!.in moving queries. The shaded area represents the query i.egioii. T h e cached area is repl.esented as a dashed rectangle. AIoving objects that belong to the query answer or to tlie query's cache area are plotted as white or gray circles. respectively. At time To (Figure 3a), two objects satisfy the query answer ( P I : 2 ) : three obP jects are in tlle cache area (P3: P5). and t ~ \ ~ o P4: objects outside the cache a.rea (PC, Pi). Only objects that either in the query or t,he cache a.rea,are stored in-memory. At Tl (Figwe 3b), all objects change their locations. HOWever, I\-e only report PC and P The ca.che area is up. : clated t o contain (P2: PC). Changes in the cache area P4: do NOT result in any updates. At T2 (Figure 3c): the query Q moves ~vitliinits cache area. Two updat,es are sent to the user: PC a n d P 2 . The cache a,rea is adjusted employt o contaiii P3 and Po only. Notice that wit,l~out : ing the cache area. we would miss P . The c o n s e ~ ~ v n t l v e caching approach requires oilly t.he knowledge of the ma,xiinum object speed, whicl~is tvpicall!; available ill moving object applications (e.g., moving cars i n road net\\sork ha\:e limited speeds). This is in contl.ast to all validity region approaches (e.g., the safe regio~i[36]. the ~ a , l i d region [dG]. a.nd the No-Action region [45])t.liat require l11e kno\vledge of the locations of other objects. This information is not available in our case since SOLE is a\vare only of objects that satisfy the query predicat,e. Thus. validity region approaches are not applicable in the case of spatio-teniporal streams.
Moving query Q1. At time To (Figure 2a): Pl is outside the area of Q1. Thus: Pl is not physically stored in the database. Recall that only objects that satisfy the query region a.re stored in the database. At time Tl (Figure 2b), Q 1 is moved. The shaded area in Q1 represents its uncertainty area.. Although PI is inside the new query region, PI is not reported in the cjuery answer where it is not actually stored. At T2 (Figure 2c): Pl moves o ~ of the query region. Thus, t PI is never reported a t the query result, although it was inside the query ~ e g i o nin the time interval [Ti:7'21. 2. Stationary query Q2. At time To. the answer of ~.~ Q2 is ( P 5 ; C ) . The cluery circular region is centered P at Q 2 with its radius being the clistance from Q2 t o P5. Since Pi is outside the cluery spatial region, Pi is not stored in the data.base. At TI: P5 is moved far from Q 2 . Since Q2 is aware only of P5 a.nd P6:~e extend the region of Q 2 to include the new location of P5.Thus, a n uncertainty area is produced. Notice that Q 2 is unaware of Pi since Pi is not stored in the database. At T2: Pi moves out of the new query region. Thus: Pi never appea.rs as an answer of Q2: although it should have been part of the ans\ver in t,he time inter\lal [TI: 7;].
4.2 Avoiding Uncertainty in SOLE SOLE avoids uncertainty areas in spatio-temporal queries 11sing a caching technicl~~e.h e main idea is t o predict T
--
" a 7
-
. --
P.............,
'.
.L.
- - - .....u -- ---: .
.
I
Stream of Moving Objects (P) Moving Spatio-temporal Objects (P) Queries (9)
(b) Shared operator and shared
,
...... J ......
la)
Snapshot at
Irme T,,
m) Snapshot at tlme TI
AU objects are moved
Ic) Snapshot at tlme T? The query Is moved
Fig. 3 Avoidiilg uncertaint~in SOLE.
5 SOLE: Scalable On-Line Execution of Continuous Queries

I11 a typical spatio-temporal application (e.g., locationaware servers). there are larae numbers of concurrent spatio-teml>oral c o n t i n u o ~ ~ s queries. Dealing wit11 each query as a separate entity (e.g.. as discussecl in Section 4) wol~ldeasily conslune the s~.steni resources and degrade the system performance. In this section, we present the scalability of SOLE in terms of handling large numbers of coiicu~-relit cont.inuous clueries of mixed types (e.g., range and kNN queries). Without loss of generality, all the discussioil in the rest of this paper is presented in the context of statiollary and mo\~ing range queries. The applicability t o k-ilearest-neigllbor queries is straightforward as described in Section 3.
(a) Separate query plan and buffer for each query
buffer pool for all queries
Fig. 4 O\'erview of shared executjoll in SOLE.
5.2 Shared hlemory Buffer
SOLE nlaintains a simple grid structure a.s an in-memory shared buffer pool among all continuous queries and obinto two jects. The shared buffel- pool is logically d i ~ i d e d parts: a query buffer that stores all outstanding continuous queries and an object buffer that is concerned with n~ovil~g object's. In addition t,o the grid struct~u-e. SOLE employs a hash t,able h t o index moving objects based on their identifers. To o p t h i z e the scarce memory resource. SOLE einploys two main techniques: (1) Rather than redundantly object P multiple times with each query storing a nlo\~ing Q, t,llat needs P: SOLE st,oresP at most once along with 5.1 Overview of Shal.ing in SOLE a reference counter tha.t indicates the number of continuous queries that need P . (2) Rather than storing all movFigure 4a gives the pipelined execution of N queries (Q1 t o Q N ) of various t ~ . p e s wit11 no sharing. i.e.. each query ing object,^. SOLE keeps t,rack \vit,ll only the signijicanl is considered a separate entity. T h e inplit data stream objects. Insignificanl objects are ignored (i.e.. dropped) goes through each spatio-temporal quei y operator sepa- froin memory. Sign(ficant objects are defined as follo\x~s: rately. \Vith each operator. we keep track of a separate Definition 2 A moving object P is considered signifbuffer that contains all the objects that are needed by icant if P satisfies any of the following two conditions: this query (e.g.. objects that are inside the query region (1) There is at least one outstanding continuous query Q or its cache area). \Vitll a separate buffer for each sin- that shows interest in object P (i.e.. P has a non-zero gle query. the memory can be exhausted with a small referelice counter). (2) P is the focal object of a t least number of continuous queries. one outstanding continuous query. Figure 4b gives the pipelined execution of the same N queries as in Figure 4a. yet with the shared SOLE \?ie define when a quel-,y Q shows interest in an operator. The problem of evaluating concurrent continu- object P as follo\vs: ous queries is reduced t o a spatio-temporal join between objects and a stream of Definition 3 A continuous clue1.y Q is interested in obtwo streams; a strea,m of n ~ o \ ~ i n g continuous spatio-temporal queries. The sh,ared spatio- ject P if P either lies in Q.s spatial area or in Q's cache teinporal join ope]-ator 11a.s a shared buffer pool that is area. accessible by all continuous queries. The output of the Ha\,ing tlle pi.evious definition of srgnuficont objects, sh,ared SOLE operator has the form (Q,: *P,) which inassertion: SOLE continuously maintains t,lle f o l l o ~ ~ i n g dicates a n additioi~ removal of object Pj to/from query or Q,. T h e shared SOLE opel.a.tor is follo\ved by a sp1i.t op- Assertion 1 Only significant objects are stored in the erator t h a t distributes the output of SOLE either to the shared meiiiory bu,,fler users or to the va.rious query operators. The split operator is similar t o the one used in NiagaraCQ [12] and it To always satisfy this assertion. SOLE continuously is out of the focus of this paper. Our focus is in realiz- keeps track of t,he following: (1) A ne\vIy incolning data ing: (1) T h e shared memory buffer, and (2) T h e sha.red object P is stored in n~einoryoi11y if P is significant: (2) At a,ny time. if an object P that is already st,ored SOLE spatio-temporal join operator.
Procedure Incoi~~i~~gNeu~Object P ; GridCell CP) (Object Begin

1 . For each Query Q; E C p 4ArD P E QF (a) P. RefCount++ (b) i f ( P E Q,) then output (Q, . + P ) . 2. i f (P.RefCoun,t)then store P i n C p an.d in hash table h.
End. Fig. 6 Pseudo code for receiving a nen- value of P . Procedure Updat.eObj(Object P,/d,P. GridCell Cpo,, ,Cp) Begin 1 . For each query Q , E P.FocolList. UpdateQuery(Qi) 2. Let L be the line (Pold. P) 3. For each quenj QQ2 (Cpoltl C p ) E U (a) i f Q; intersects L , then
-
nhring
TOS
queries
querj-
Stream of moving objects [PI
stream of spetiotemporal querlcs (Ql
i f P 6 Q; t h e n Output ( Q , .+ P ) : i f P o l d $ Q;: ! P. RefC0un.t + + - else Ovtpui. ( Q i . - P ) : i f P @ Q; then P.RefCount-in the shared buffer becomes insignificant: we drop P (b) e l s e i f ~i in.tersects L immediately from the shared buffer. - if P E ~i then P.RefCount++: e l s e Signijcant moving objects are hashed t o grid cells P. RefCoun,t- based 011 their spatial locations. An entry of a signij- 4, if ( ! P . R e f ~ & n t ) then delete an,d ign,oreP: return, icant moving object P in a grid cell C has the form 5. i f (Cpc,,,l# Cp) t h e n nlove PC,/,/ froln Cpo,, to C p . ( P I D , Loco.tion.,R e f C o u n , t . FocalList). P I D and Loco.tionG Update the location. of P o / d to that of P i n CP. are the object identifer and location. respectively. Ref Coun&nd. indicates the number of queries that are interested in P. F i g . Pseudo code for updating P ' s locatioll, FocalList is the list of active moviilg queries that have P as their ,focal object. Unlike d a t a objects that are stored in only one grid cell. contiiluous queries are stored in all ory, (2) Update of the location of object P:(3) A new grid cells that overla'p either the query spatial area or the \Ipdate of the regiorl of a query Q. (4) que1.y cache area. A query elltry in a grid cell collt,a'ins query Q. Figures 6: 7. 9: and 10 g i ~ the pseudo ~e onl>r the query identifier ( Q I D ) . The spatial regioll code of SOLE up011 recei\~illgeacll input type. The deeach query is stored separately ill a global l o o k l l ~ table. tails of algorit,hllls are described b e l ~ \ \SOLE makes ~.
Fig. 5 Shared join operator in SOLE.
5.3 Shared Spat'io-temporal Join Operator
O v e r v i e w . Figure 5 puts a magnifying glass over the shared spatial join operator in Figure 4b. For any incoining data object. say P: the shared spatial join operator consults it's query buffer t o check if any query is affected by P (either in a positive or a negative way). Based on the result, ,ve decide either to store in the object buffer or to ignore p all,j delete old location (if ally) from the object buffer. on the other lland? for ally illcolllillg co,ltilluous qLlel.y, say-Q: first we store Q or update Q : old locatioll (if ally) ill tile query buffer, Then: we stilt the object buffer to check if ally of t,he objects needs froln Q : allswer, ~~~~d ~ t o be to or this opera,tioll. jll~lllelllory stored objects lllay beinsignificant, hellce, are deletecl illllnedjately from dithe object btlffer. Statiollary queries are s~lblllit,ted l.ectlJ, to tile spatial joill operator, ,xrllilelllovillg clueries a,re gellerated from the nlovelnellt of their focal objects. A l g o r i t h m . Based on the data stored in the shared buffer: SOLE distinguishes among four types of data inputs: (1) A new data object P that is not stored in inem-
use of the follo\ving notations Q indicates the extended queiy region that covers the cache area so that Q C Q. C Q . CjQ are the set of grid cells that are covered by Q respectively. C p represents a single grid cell that, and covers t,he object P. I n p u t T y p e I: A n e w o b j e c t P. Figure 6 gives the pseudo code of SOLE upon receiviilg a nelv object P in the grid cell C p (i.e.; P is not stored in memory). P is test,ed against all the queries that are stored in C p (Step 1 ill Figllre 6). For each query Qi E CP:0 1 1 1 ~three cases can take place: (1) P lies in Qi but not in Q. 111 this ~ case, we need only to illcrease t,he refereilce counter of P t o indicate that there is one inore cluer!r interested in P (Step l a ill Figure 6). Notice t'hat no output is ~ r o d u c e d in this case since P does not satisly Q i . (2) P satisfies Q i . I11 this case, in addition to iilcreasii1g t,he countel-. u7eo l ~ t p l ~ tpositive 11pdate that indicates the a addition ol' P t'o the answer set of Q, (Step l b ill Figure 6 ) . In t,he above tnTocases. P is storecl in the shared buffer as it is considered significant. (3) P neither satisfies Qi nor lies in Q,. Thus, P is simply ignored as it is insign,ijcant. I n p u t T y p e 11: An u p d a t e o f P. Figure 7 gives the pseudo code of SOLE upon receiving an update of ob-
0:
Procedure UpdateQrlery(query Qold: Begin Q)

-
la) All
cases of updating P's location
m)Actloo taken for each case
Fig. 8 All cases of updating P's 1ocat.ion Procedure St,at,ional.yQuery(queryQ) Begin

-
For each grid cell cj E CQ 1. Register Q in c, 2. For each object P E c j AND P, E Q i - P.Ref'Countt t: if P E Q then output (Q: + P )
For each object P E ( e ~ ,n, e~ ~ i Q) 1. if P, E Qoldthen - if P, @ Q then (01~tpul(Q.-P,). if P @ i then (P,.Re,fCoun,l--: if (!P, .RefCounl) thendelete(P;))) 2. e l s e if P E Q then (Output (Q.+P;), if P @ Q"/J i i then Pi.RefCoun,ttt ) 3. else if P ; E Qold AND P @ Q then i (P, . Ref'Coun,t--: if (!Pi. RefCount) then delete(P,)) 4. else if Pi E Q AND P i @ ()old then Pi.RefCount+t. Register Q in cQ unregisler Q f'roi~, cQ - rQ
eQ
End. Fig. 10 Pseudo code for updating a query.
End. Fig. 9 Pseudo code for receiling a new query Q
In addit.ion, objects t h a t satisfy Q ~.esultsin lie in pl.oducing positive updates. ject P ' s location. 7:lle old locat,ion of P is retrieved from Input Type IV: An update of Q's region. Figthe hash table h . First. we evaluate all moving queries ure 10 gives the pseudo code of SOLE upon receiling (if any) that have P as their ,focal object (Step 1 in Fig- an update of a ~noviilgquery region. All stored objects ure 7). Then: we check all the queries that, belong t o in all cells that are covered by the old and new regions eitller Cp or Cpr,,,, (Step 3 in Figure 7) against the line of Q are t.est.ed against Q. Figure l l a divides t.he space Figure 8a gives ]line different covered by the old and new regions of Q into seven reL that connects P and Po/,,. and P gions (RI-Ri). T h e action taken for any point t,llat lies cases for the intersection of L \\;it11 Q where Pold are plotted as white and black circles: respectively. Both in any of these regions is given in Figure l l b . Similar to Poll, P can be in one of the three states: in: cach.e, or Figure 8b. a ~.egion could have any of the t h e e states and Ri out that indicates that P satisfies Q: in t,l~e cache area of in: cache. or OIL^ based 011 whether Ri is inside Q: is in respectively. The action taken the cache area of Q: or is outside Q. Basically: no action Q. 01- does not satisfy QQ: for each case is given in Figure 8b. Basically, if there is is taken for objects in any region Ri that n~aintainsits no cl~ange state from Poldo P (e.g.. L1: L5: and Lg): state for botli Q and Qoid (e.g.: R4). If a region R i is inof t no act,ion will be taken. If POld in Q : ho\vever, P side Qold:but is not in Q: (e.g., R2 and R3):we olit,put a was is not, (e.g.; L 2 and Lg) \\re o u t p l ~ t the negative update negahve update for each object in Ri. We decreinent the (Q: - P ) . The reference counter is decreased only when refereilce counter of these objects only if they lie in the Pold of interest to Q while P is not (e.g., L g and L G ) . region that is out of the new cache area (e.g.: R z ) (Step 1 is Notice that in the case of L2: we do not need t o decrease in Figure 10). Also, the reference counter is clecrenlented the reference counter \\:here althol~gh does not satisfy for all objects in the region that are in the old cache area P Q: P is still of interest to Q as P lies in ~ i Also, in but are out of the new cache area (e.g.. R 1 ) (Step 3 in . the case of L6: \\re do not need t'o output a negative up- Figure 10). Similarly. the reference counter is increased da.te: ho\vever \\re decrease the reference counter. In this for regions R6 and Ri a:hile a positive output is sent for are case, since P and Porlr not in the a,ns\ver set of Q: the points in regions R5 and R6. Notice that whenevelthere is no need to update the answer. Similarly, with we decrement the reference counter for any moving oba symmetric behavior, \\-e output a positive update in ject P: we check whether P becomes insignificant. If this t,he cases of L1 and Li a.nd we increment the reference is the case: we iinlnecliately drop P from mernory (e.g.: countel- in the cases of L i and Ls. After testing all cases, Steps 1 aild 3 in Figure 11). Finally: Q is registered in \\:e check \\:])ether object P becomes insignificant. If this all the ne\v cells that are covered by the ne\\l region and is the case, \\re immediately drop P from memory (Step 4 not the old region. Similarly, Q is ui~registeredfrom all in Figure 7). If P is still signi,fica.nt, we update P ' s lo- cells that are covered by the old region a.nd not the ne\y cation and cell (if needed) in the grid structure (Steps 5 region. and 6 in Figure 7).
Input Type 111: A new query Q. Figure 9 gives the pseudo code of SOLE 11pon receiving a continuous statioilary query Q. Basically, \ye register Q in all the grid cells that are covered by Q. In addition, we test Q against all data objects tha,t are stored in these cells. We increase the reference collllter of only those objects that
0.
6 Approximate Query Processing in SOLE

Even with the scalability features of SOLE, the memory resource may be exhausted a t intervals of ~mexpect.ed massive nulnbers of queries ailcl moving objects (e.g..
%ew
f
Update ----........--.-...... Statistics
Shared Join Operator
1
(1) Trigger (Memoryis almost full) (2) Update eritcria
3
Y
la) All cases of updating query region
b)Action taken for each case
Load
Fig. 1 All cases of updatil~g 1 Q's region.
during rush hours). 'l'o cope with such intervals: SOLE Objects Queries is equipped with a se!f-2,uning approach that tunes t'he Archit'ecture of in 'OLE. memory load to support a 1a.rge.number of concun-ent Fig. queries, yet with a.11 approxiinate ails\17er.The main idea is t o tune the definition of significant objects based on wit11 all the cache area. if the systein is st,ill o~erloaded, the current work1oa.d. B. adapting the definition of sig! ancl we ]la.\-eriot reached to the inilli~n~un permissable acnificant objects. the ineinory load \\;ill be sh,ed in two cui-acy yet. we start t,o reduce Q's area itself. T~IIIS: tlle ways: (1) In-memory stored o1,jects will be re~risit.ed for notion of significant objects is adopted to be t,hose tuples of the new ineanii~g s.ignf.fica~ztobjects. If a.n insigni,ficnn.t t,liat lie i l l the reduced query area of at 1ea.st one coi~tinobject is fo~111d: 11-ill be shed from memory. ( 2 ) Soine it 11011s outstanding query. By reducing the cluer!. sizes of of the newly input data will be shed a t the input le~rel. all outstantling queries, objects that a,re outside of the Figure 12 gives the a.rchitectu1-eof self-tuning ill SOLE. reduced area and are not of interest t o ail!. other query Once the shared join operator incurs high resource coil- are i m m e d i a t e l y dropped fi-om inenlory anti the corresuinptio~l, e.g.. the ineinory becoines allnost full, the join sponding negative updates are sent. During the course of operat'or triggers the execution of the load shedding pro- exec~~tion. gradually increase the query size to cope \ve cedure. T h e load shedding procedure may consult some \vitll the memory load. Finally, when t,l~e system reaches statistics that are collected during the course of execu- a stable stat.e: we retain the origi~lal query sizes. tion to decide 011 a ne\v n~eaningof significant 01,jects. Q u e r y load shedding has two main ad\.antages: ( I ) It While the shared join operator is running with the ne\v is intt~iti\*e and simple t o implenlent where there is no definition of s.igni,ficanL objects. it may send updates of need to ma.intain any kind of statistical information: and the current memory load t o t,he load shedding pr0cedm.e. (2) Insignificant objects are immediately dropped from T h e load shedding proced1u.e replies back by con tin^^- memory. 0 1the other side, there are t\vo main disadva111 ously adopting the notion of sign(fican2 objects based on tages: (1) The query load shedding process is expensive, the continuously changing nlelnory load. Finally, once where it scans a.11 stored objects and queries. Tliis exthe memory load ret111.11~o a stable state; the shared haustive behavior results in pause time intervals where t join operator retains the original meaning of significant the system cannot produce output nor process data inobjects and stops the execution of the load shedding pro- puts. (2) Alt,hough the query accuracy is gliaranteed (ascedure. Solid lines in Figure 12 indicat,e the m a ~ ~ d a t o r ysu~ning uniforin data distribution), there is no guarantee steps that should be taken by any load sh.edding tech- of the a.il1ount of reduced memory. Assume the case that nique. Dashed lines indicate a set of operations that may the reduced area from a query Qi lies completely inside or may not be employed based on the underlying load another query Q,. Thus, even though Qi is reduced, we shedding technique. 111 the rest of this section, we pro- cailllot drop tuples from the reduced area \\-here they are pose two load shedding techniques, namely q u e r y load still needed by Qj. Thus, the accuracy of Qi is reduced, shedding and object load shedding. yet the a.mount of memory is not. 6.1 Query Load Shedcling T h e main idea of q u e r y load shedding is t o negotiate the query region ~ v i t hthe user. R ~ h e n e \ ~ a rquery. say Q. is e acc~rac?. submitted to SOLE, Q specifies t,lle n l i n i n ~ ~ l ~ n tha.t is acceptable by Q . Initially. the submitted query Q is evaluated with complete accuracy. However. when the systein is overloadecl, Q's accuracy is degraded to its minimum perlnissible a.ccuracy. Reducing the accuracy is achieved by shrinking Q's cache a.rea fro111 all directions to have a smaller cache a]-ea..After we are done
6.2 Object Load Shedding Tlle main idea of object load shedding is t o drop objects that have less effect 011 the average query accuracy. Thus. tlie definition of significant objects is adopted t,o be tJliose objects that are of interest t o a t 1ea.st k queries (i.e., objects with reference counter greater than or equal k). IVotice that the original definition of significant objects implicitly assumes that k = 1. A key point in object loa,d shedding is that we do not perform an exhaustive scan to drop insign{ficant objects. Instead, insignificant
objects are lazily dropped \vhene\.ei. they get accessed later during the course of execution. Such lnz?j bel~aviol. completely a\.oids the pause tilne intervals in query load shedding. 111 contrast t o query load shedding, in object load shedding: a7eguaralltee the 1.ec1uced I1lelnol.y load. Duril~g course of execut,ion: \ve monitor the inelllthe ory load aild clecrease/increase k accordillgly. Once t,he syst,em stabilizes and returns t o its original st,ate.,11-e set k = 1 t o ret.ain the original executiol~of SOLE. Dete1.mil~ing the t,hreshold value k is acl~ie\~ed mailltaining by a statistica.1 table S that keeps track of the 11unlberof obof jects t h a t satisfy a certa.il1 n~lmber queries. Assmnillg that \\re will ne\:er drop an object that 11a.s a reference co~mt,er greater than IV: t,llell S can be represented as Fig. 13 Greater Lafayette. Indiana. USA. 1' a n array of ?\ ]lumbers where the jtll entry in S correb r sponds t o the n ~ u ~ ~ of emoving objects Lhal are of ill\Ve use the Network-based Genemtor of A4oving Obterest to j queries. \\:hene\:er the system is o\:erloadecl, jects 171 to generat,e a set of mo\ril~gobjects and movwe go through S t o get the millimuln k that a,chieves the ing queries in the for111 of spatio-temporal streams. The required reduced load. input t'o the generator is the road inap of the Greater Lafayatte ( a city in the state of Illdia,na, USA) given in Figure 13. The output of the generator is a set of nlov6.3 Loatl Shedding \vitl~ Locking ing points t , l ~ a move on the l.oac1 network of the given t cit,y. h.Ioving objects can be cars: cyclists: pedestrians, Degenerate cases may affect se\:erely the bellaviol. of load etc. Any moving object call be a f o a l of a moving query. sheddillg. Col~sitlert,he case of a query Q that has only Uilless ~nentionedother~vise,we generate 1101< lnovillg one object P as its ansu-er \\~11ile is not of interest t o objects as follows: Initially, u:e generate lOIi moving obP any other quer!.. By applying object load shedclii~g. mill ,jects from the generator, then we run the gellerator for P it be dropped 1~11e1-e is of intei-est to only one quer!. Q. 1000 time ~ m i t s At each time unit, \ve generate new 100 . Thus, the accuracy of Q is dropped to zero. To allevi- nloving objects. Moving objects are required t o report ate such problem, we use a locking technique. Basically: their locations every time unit T. Failure to do so results each query Q has a tl~resholdn where if Q has less than in disconnecting the moving object from the server. n object,s in its a.nswer set.. all the 71 object,^ are locked. T h e rest of this section is organized as follows. SecLocked objects cio not participate in the statistical table t,ion 7.1 studies the effect of the cache size and the gain S . Once an object is locked. t,he corresponding enti-y in S of llaving SOLE as a pipelined operat,or in terms of single is updated. \?71~ene\,er lazily drop objects fro111 mem- query execution. In Section 7.3, \\re stud!; the scalability we ory, we make sure t h a t we do not drop any locked object. of SOLE. Finallh Section 7.6 st,udies the performance of T h e concept of lockin,g can also be geileralized to accom- load shedding techniques. illodate locking of ilnport,a.nt objects and/or queries.
7.1 Single Execution: Size of the Cache Area
7 Experimental Results
In this section: \Ire st,rldy the performance of various aspects of SOLE that includes: the size of the cache area. the benefit of encapsulating SOLE in a pipeline operator, the grid size of t,he shared memory buffer-, the scalabilit,y of SOLE, and approximate query processing via load shedding techniques. All the expel.iment,sin this sectioil a.re based on a real in~plementat~io~l of SOLE algorithms and opera,tors inside our prototype clata,base engine for spatio-telnporal stl.eal~~s: PLACE [30,31]. \Ve run PLACE on Intel Pentium IV CPU 2.4GHz wit11 512h4B RAM runllil~g \\;illdo\vs XP. Without loss of generality: all the presented experiments are conducted on stationary and n~oving continuous spatio-temporal queries. Similar results are achieved \\:hell employing continuous k-nearest-neighbor queries. Figures 14a-d give the performance of the first 25 seconds of executing a inovillg query of size 0.5% of the spa.ce with no cache, 25%)cache: 50%' cache, and conservative cache (i.e., 100% cache), respectively. Our perforinance measure is the query accuracy that is represented as the percentage of the n~unberof produced tuples t o that should have been produced if all the actual n~unber mo\~ing objects are materialized into secoildarg storage. \?Tithout caching (Figure 14a), the query accuracy suffers fro111 continuous fluctuations where sometilnes the accura.cy drops t o 85%. With only 25%#cache the query accuracy is greatly enhallcecl (Figure 14b). The accuracy is almost stable with minor fluctuations that degrade the accuracy t o only 95%. A conservati,ue caching would ret sult in having a single line t l ~ a al\\iays have 100% accuracy.
(a) No Cache
Fig. 14 Cache area in SOLE.
(b) 25% Cache
(c) 50% Cache
(d) Conser\lati\ie Cache
1 0 0 % Cache --(3-5 0 % Cache -a--2 5 9 Cache +
(a] SELECTION
0 5 10 15
[b)JOIN
Fig. 16 Pipelined SOLE operators.

20 25
Time
Fig. 15 Cache area in SOLE
Figure 17 compares the high level implementation of the a,bo\-e quer!- \\rit,ll pipelined INSIDE operator for bot.11 stationary and mo\;ing queries. T h e selectivity of t,he queries varies fi-om 2% to 64%. T h e selecti\rity of the selection operator is 5%,.Our measure of comparison is the number of t,uples that go through the query eval~lation pipeline. When SOLE is iinpleinented a t the applica.tion level: it,s performance is not affected by the when INSIDE is pushed before query select,i\!it!;. Ilo\\~e\;er. the selection., it acts as a filter for the query evaluatioil pipeline, thus, li~nitingthe tuples through the pipeline 7.2 Single Execut'ion: Pipelined Query Operators to only the progressive updates. With INSIDE selectivit,y less than 32%1: pushi~lgINSIDE before the selectioil Consider the query Q: "Contin.uously report all t~-.u,cks th,at are w ~ t h , ~ n AdyArea:'. MyArea can be either a sta.- grea,tly affects the performance. However, with s e l e c t i ~ tionary or moving range query. A high level i~nplemen- ity inore than 32%':it would be better to have the INSIDE tation of this query is to have only a selectio~l operator operator above the selection operator. Consider a more complex query plan that contains a that selects only the "trucks'.'. Then, a high le\:el algojoin opei-a.t,or.The qller!: Q: "Continuo.u.sly report Tnovrithm iinpleinentation wollld take the selection output However. an ing objects tha,t belong to 7ny ,favorite set of ol~jectsand and incrementally prodllce the query rest~lt. encapsulation of SOLE into a pl~ysicalpipelined query ili.a,l.lie within A,fyAren:'. A high level implementation of operator allows for inore flexible plans. Figure lGa gi\fes SOLE \\iould probe a streanling database engine to join a query evaluation plan \vl~enp ~ ~ s h i n g SOLE oper- a.11 mo\,ing objects \\lit11 my fa,voriteset of objects. Then, the ator before the selection operator. The following is the t,he output of the join is sent t o the SOLE algorithm for furt,her processing. However, with the INSIDE operator, SQL presentation of the querv. we can haye a query evaluation plan as that of Figure lGb SELECT h,I.ObjectID where the INSIDE operator is pushed below the Join op-
Figure 15 gives the memory overhead \\:hen using a 25%: 50%,,or 100% (conservative) cache sizes. T h e o\:erhead is computed as a, perceiita,gefi-om the original query memorjr requirements. 7111~sa 0%' cache does not incur any overhead. On avera,ge a 25% ca,che results in only 10% overhead over the original query: \vllile the 50%)a.nd 100% caches result ill 25% and 50% o\~erhead, respectively. As a coinpromise between the cache overhead and the query accuracy. we use a 25%' cache in SOLE in all the following experiments.
FROM hIovingObjects hi WHERE hI.t,!;pe = "tl-u.ck" INSIDE AdyArea
< l i d 21-c
tilid
9 r
(a) Redundancy
Query S e l e c t i v i t y
(b) Response Time
Fig. 19 Grid Size.
Fig. 1 7 Pipelined operat,ors wit11 SELECT
3000
25OOLA
4
,$
2000 1500 1000 -
5
C
LC
4
a m
SOLE a s a n O p e r a t o r + SOLE a s a Table F u n c t i o n -A
O I
T I, 3 0 . 4 <, vvci-i
0.6
(I
0.1
Zi.C
a
500
(a) Ratio
(b) Table of values
0 A 0.02
0.04 0.08 0.16 0.32 0.64
Fig. 20 hlaximum Number of Supported Queries.

Query S i z e
Fig. 18 Pipelined operators xvith Join.
7.3 Scalable Execution: Grid Size

Figure 19 studies the trade-offs for the number of grid cells in the shared memory buffer of SOLE for 50K inovil7g clueries of \ J ~ ~ ~ O U S Increasing the number of cells sizes. in each dinlension increases the redundancy that result,s froin replicating the query entry in all overlapping grid cells. On the other hand: increasing the grid size results in a bett'er respoilse time. T h e response time is defined as the t,iine interval froill the arrival of an object, say P: to either the time that P a m e a r s a t the o u t ~ u of SOLE or t the time that SOLE decides t o discard P: IVhen t,he grid size increases over 100. the response time performance degrades. Having a grid of 100 cells in each diinensioil results in a total of 10I< small-sized grid cells, thus. with each movement of a moving query Q. we need to registei-Iunregister Q in a large number of grid cells. As a coinpronlise betvrreen redmlclancy and response time. SOLE uses a grid of size 30 in each dimension.
erator, TIle SQL represelltatioll of the abo\re query is as follows:
SELECT hI.ObjectID
FROM hloving0bjects 1.1: h.IyFavoriteCars F
W E E 1,l.ObjectID = F.ObjectID HR INSIDE AJyArea Figure 18 compares the higll level implement~at~ion of the above query wit11 the pipelined INSIDE operator for both stationary a.nd moving queries. The selectivity of the queries varies from 2%)to G4%#.As in Figure 17: t,he selectivity of SOLE does not affect the perforinance if it is ilnpleineilted in tlle application level. Unlike the case of selection operators. SOLE pro\rides a dramatic illcrease in the performance (around a,n ortler of magnitude) \vIlen implemented as a pipelined operator. The nia.in reason in this dramatic ga,iil in performance is the high overhead incurred whell evaluating the join operation. Thus. the INSIDE operator filters out the input tuples and limit, the input t o the join operator to only the incremental positive and negative updates.
7.4 Scalable Execution: SOLE Vs. Non-Shared Executioii

Figure 20 compares the performance of the SOLE shared operator as opposed to dealing with each query as a separate entity (i.e., with no sharing). Figure 20a gives the ratio of the number of supported queries via sharing over the 11011-sharing case for various query sizes. Some of the
' , .
<
<..I
.C
( . A
II
<I
'.
(..! U.?
i
i
(a) Query area
(b) Cache area
(a) Number of queries
(b) Query size ki Percent or moving queries
Fig. 2 1 Data size in t.he query and cache areas.
Fig. 22 Response t,il-nein SOLE
act,ual values a.1-edepicted in the table ill Figni-e 20b. For snla.11 query sizes (e.g., 0.01%) \vith sharing. SOLE supO ports more than G K queries, n:hich is alrnost 8 tiines better than the case of non-sharing. 7'he performance of sharing increases with the query size \\:here it beconles 20 ti~lles better than non-sharing in case of query size 1 % of the space. The inail1 reason of the increasing perforinance with the size increase is that sharing benefits from the overlapped areas of continuous queries. Objects that lie ill any overlapped area are stored only once in the sharing case rather than multiple times in the non-sharing case. \4:ith small query sizes, overlapping of query areas is much less than the case of large query sizes. Figures 21a and 21b give the memory requirenlents for storing objects in the query region a.nd the query cache area: respectively: for 1I< queries over lOOK moving objects. In Figure 21a, for large query sizes (e.g., 1% of the space), a 11011-shared executioil \vould need a inemor? of size lhl objects, while in SOLE. we need, at no st, a meinory of size lOOK objects. T h e nlain reasoil is that \vitli 11011-sha,ring:objects that are needed by multiple ly queries are r e d ~ ~ n d a n tst,ored in each query buffer: while with sharing: each object is stored a t most once in the slla.red inemory buffer. Thus, in terms of the query area: advantage over the SOLE has a ten t,imes perforn~ance non-shared case. Figure 21b gives the memory requireinent for storing objects in the cache area. The behavior of the non-sharing case is expected where the ineinory rec~uiren~ents increase wit,l~ increase in the query size. the Surprisingly: the caching overhead in the case of sharing decreases \\:it11 the increase in the query size. The main reason is t.hat ~\:itht,he size increase, t,he ca.ching area of a certain query is likely t o be part of the actual area of allother query. Thus. objects that are inside tliis caching area are not considered an overhead, where t,hey are part of the actual ans\\:er of some other query.
mance measure is the average response time. The response tiine is defined as t;he tiine iilter~ralfi-om the arrival of object P t,o eit,her the tiine that P appears at, the output of SOLE or the t,ime that SOLE decides t o discard P. We run the experiment twice: once with only stationary queries. and the second time wit11 only m o ~ i n g queries. The increase in response tiine with the nuinber of queries is acceptable since as r e increase the number of queries 10 times (fi-0111 5K t o 5010. we get only twice the increase in response tiine in the case of stationary queries (from 11 to 22 msec). The perforinance of moving queries has only a slight increase over stationar2- queries (2 insec in case of 50K qnei-ies). Figure 22b gives the effect of varying both the cluery size and the percentage of moving queries on the response time of the SOLE operator. The nunlber of outstanding queries is fixed t o 30K. T h e response time increases wit11 the increase in both the query size and the percentage of moving queries. However, the SOLE operator is less sensitive t o the percentage of moving queries than to the query size. Increasing the percentage of mo\ring queries results in a slight increase in response time. This performance indicates that SOLE call efficiently deal with moving queries in the same performance as with stationary queries. On the other hand, increasing the query size from 0.01% t o 1% only doubles the response time (from around 12 insec t o around 24 msec) for va.i.ious moving percentages.
7.6 Load Shedding: Accuracy in Query Ans\ver
Figures 23a and 23b compare the perforinai~ce query of and object load shedding techniques for processing 1Ii and 25Ic queries with vai-ious sizes. respectively. Our performance iileasure is the reduced load t o achieve a ceratin query accuracy. \Vhen the systeiu is overloaded. we vary the required accuracy froin 0%) to 100%. I11 degenerate requires keeping the cases, setting the accura,cy t o 100%~ 7.5 Scalable Execution: R.esponse Time whole menlory load (100% load) while setting t,he accuFigure 22a gives the effect of the llunlber of collcurre~lt racy to 0% requires deleting all memory load. T h e bold continuous queries on the performance of SOLE. T h e diagonal line in Figure 23 represents the required accuaccurac!,, we number of queries varies froin 5K to 50K. Our perfor- racy. It is "expected" that if we ask for m.%,
Fig. 25 Perfornlance of Object Load Sheddil~g.
(a) 1K Queries
Fig. 23 Load Vs. Accuracy.
(b) 25K Queries
reduced area. Such tuples are still of int,erest to ot,lier outstanding queries. Figure 25 focuses on the performance of object load shedding. T h e rec11lil.ed reduced load varies from 10%.to 90% while the number of queries varies from l l i to 321i. This experiment shonrs that object load sl1edcling is scali~lcreasing number of qt~eries. the able and is stable 1~11en For example: u~hen reducing the memory load to 90%. we consistently get an a c c l ~ r a c j ~ around regardless of the number of queries. Such consistent behavior appears in various reduced loads.
(b) 90% Accuracy
(a) 70% Accuracy
7.7 Loa,d Shedding: Scalability wit'l1 Load Shedding

Fig. 24 Reduced load for a certain accuracy.
will need t o keep only m.% of the memory load. Thus: reducing the memory load t o he Ion-er than the diagonal line is considered a gain over the "expected" behavior. The object load shedding always inaint,ains better performance than that of the query load shedding. For example; in the case of II< clueries, to achieve an average accuracy of 90%,: we need to keep track of only 85% of the memory load in the case of object load shedding \vl]ile 97% of the memory is needed in the case of query load shedding. The perforinance of both load shedding techniques is worse \vit,h the increase in the number of queries to 25K. However, the object load shedding still keeps a good performance where it is allnost equal to the "expect,ed" performance. The performa.nce of query load shedding is dra~ilaticallydegraded where we need more than 90% of the memory load t o achieve only 20% accuracy. Figures 24a a i d 24b compare the performance of query and object load shedding to achieve a,n accuracy of 70% and 90%'; while varying the number of queries froin 21i to 321<. The object load shedding greatly outperfornls the q.uery load shedding and results in a better performance than t,he "expected' reduced load for all query ~ sizes. The i i i a i ~reason behind the had performance of query load slledding is t,l]at in the case of a large number of queries. there are high overlapping areas. Thus, the reduced area of a certain query is highly likely to overqueries. So, even though we reduce the query lap otl~er area. we cannot drop any of the tuples that lie in the
Figure 26a gives the ra.tio of the number of sl~pportecl queries with query and object load shedding technic~ues over the sharing ca,se with no load shedding. All queries ~n are supported with a n ~ i n i ~ n uaccuracy of go%,. Depending on the query size, query load shedding can support up to 3 times more queries thail the case u.itl1 no load shedding. This indicates a ratio of up to 60 times better than the no]]-sharing cases (refer t o the table in Figure 20h). On the other hand: object load shedding has 111uch better scalable performance than that of query load shedding. \Vit,h object load shedding SOLE can ha.ve up to 13 times more queries than the case of no load shedding, which indicates up t o 260 times than the case of no sharing. Figure 26b gives the performance of the query and object load shedding techniques in terms of ma.inta.ining the average query accuracy with the arrival of continuous queries. The horizontal access advances with time t o query. With tight represent the arrival of each conti~luous memory resources. the memory is c o n s ~ ~ ~ n e d completely with the arrival of a.bout 1200 queries. At this point, the process of loatd shedding is triggered. T h e required memory consuinption level is set t o 90%. Since qu.ery load shedding inlinediately drops tuples from nlemory: the query a.ccuracy is dropped sharply to 90%).1 1 contrast., in 1 object 1oa.d shedding: the accuracy degrades slonily. With the a,rrival of more queries, query load shedding ti-ies to slowly enhance it's perfor~nance.However: the memory the recovery of query load consumption is faster t ' l ~ a n shedding. Thus. soon. we mill need t o drop sonle more tuples fro111 memorj. that will result in less accuracy.
Tatb~ll. Xing: Y.: Zdonik. S.: The Design of the RoreN.: alis Stream Processing Engine. In: Proceedings of the Inbernational Conference on Innovative Data Syste~ns Research: CIDR (2005) 2. Abadi: D.J.: Carney: D., Cetintemel: U.. Cherniack: hl.: Convey, C.: Lee, S.. Stonebra.ker, hl.: Tatbul. N., Zdonik, S.B.: Aurora: A New h,lodel and Architecture for Data Stream hlanagement. VLDB Journal 12(2) (2003) 3. Arasu, A., Widom, J.: Resource Sliaring in Continuous I . . L I " . , <. Sliding-\Vindo\\f Aggregates. In: Proceedings of the Ins, vue, ,rnr "r u n n c , ternational Conference on Very Large Data Bases. \rI,DB (2004) (a) Ratio (b) Query Arrival -1. Ayad, A.: Naughton: J.F.: Static 0pt.imization of Conjunctive Queries with Sliding \Vindo\vs Over 1nfi11it.e St.rean~s. Proceedings of the AChl International Con111: Fig. 26 Scalability \vitll Load Shedding ference on hlanagement of Data: SIGh.IOD (2004) 5. Babcock, B.: Datar: hsl., hlota~ani: Load Shedding for R.: Aggregation Queries over Data Streams. 111: ProceedT h e bellavior continues with txvo contradicting actions: ings of the International Conference on Data En~ineerin:. ICDE (2004) (1) Query 1oa.d shedding tends t o enhance t h e accuracy 6. Babu. S.. Widom. J.: Continuous Queries o\.er Data by retaining t h e original query size, a n d (2) T h e arrival streak. '~1Gh.10~ Record 30(3) (2001) of more queries c o n s ~ u n e s memory resources. Since t h e 7 . Brilllchoff. T.: A Fra.me\\rork for Generating Net.\\.orksecoild action is faster tllan t h e first one: t h e perforBascd \loving Objects. Geolnfornlatica 6(2) (2002) 8. Cai, Y.. Hua: K.A.! Cao. G.: P1,ocessing Rangeinance has a zigzag behavior t h a t leads t o reducing t h e hlonitoring Queries on Heterogeneous hlobile Objects. query accuracy. O n t h e other ha.nd. object load shedding In: Proceedings of the Internat.iona1 Conference 011 hlodoes not, suffer from this drawback. Instead: d u e t o t h e bile Data. h?a.nagement: h1Dh.I (200:l) smartness of choosiilg victim objects, object load shed9. Chakka: V.P., Everspaugh! A.: Patel. 3.31.: Indexing Large Tra.ject,ory Data Sets with SETI. In: PI-oceedings ding always maintains sufficient. accuracy with ininiin~uin of the Internat,ional Conference on Innovative Data Sysmemory load. tems Research, CIDR (2003) 10. Cha.ndrasekaran, S., Cooper. 0.: Deshpailde: A.. Franklin, hl.J., Hellerstein, J.hl.: I-long. \\:.: I<risl~nan~urtliy,S.: h?adden: S.: Raman. V..Reiss: F.: Shah: 8 Conclusion h1.A.: TelegraphCQ: Continuous Dataflow Processing for an Uncerta.in World. In: Proceedings of the International We presented t h e Sculnble On.-Line Execution algorithin Conference 011 Innova.t,iveData. Syst,ems R.esearch. CIDR (2003) as (SOLE, for short) for c o n t . i n u o ~ ~n d on-line evaluation S.. spatio-temporal clueries over 11. Cl~andrasekaran, Franklin. h.1.J.: PSoup: a s!-stem for of concun-ent c o n t i n ~ ~ o u s strean~ingqueries over streaming data. \ ' L D ~ Journal spatio-temporal da,ta streams. S O L E is a n in-memory 12(2): 140-156 (2003) algorithin t h a t ut'ilizes t h e scarce inelnory resources ef- 12. C h e l ~ ;J.: DeWitt, D.J.. Tian, F.. \\'ang: Y.: NiagaraCQ: A Scala.ble Continuous Quer!. System for Interficiently by keeping track of only those objects t h a t a r e net Databases. In: Proceedings of the ACh.1 International considered signi$cant. S O L E is a unified framework for Conference 011 h?anagelnent of Data.. SIGhIOD (2000) stationary and moving queries t h a t is encapsulated into 13. Dobra.: A., Ga.rofa.laltis, h.1.N.. Gehrke, J.: Rastogi. R.: a physical pipelined query operator. T o cope with inStatic 0pt.imization of Conjunctive Queries \\;it11 Sliding \Vindo\\~sOver Infinite Streams. In: Proceedings of the tervals of high arrival rates of objects and/or queries, Interna.tiona1 Conference on Extending Database TechS O L E utilizes loatl shedding techniques t h a t a.im t o supnolog~r:EDBT (2004) p o r t more continuous queries, yet with a n approximate Liu, L.: ILlobiEyes: Distributed Processing of 14. Gedik, B.! answer. Two load shedding techniql~eswere proposed: Continuously hfoving Queries 011 hIo\:ing Object,s in a. namely: query load shedding and object load shedding. hIobile System. In: Proceedings of the International Conference on Extendine: Dat.abase T e c l ~ n o. o ~ ~ . EDBT Experimental results based o n a real iinpleinentation of ul (2004) S O L E inside a prototype d a t a stream inanageinent sys15. Golab, I,.: Ozsu: h1.T.: Processing Sliding \\?indo\\: hlultiten1 show t h a t S O L E can support u p to 20 tiines inore Joins in Cont.inuous Queries over Data Streams. In: Procontinuous queries t h a n the case of dealing with each ceedings of the International Conference on Very Large Data Bases, VLDB (2003) query separately. W i t h object loa,d shedding, S O L E can support up t o 260 t,imes inore queries t h a n t h e case of 16. Hanlmad. h.1.A.: Franklin. h.1.J.. Aref. \Y.G.: Elmagarmid, A.K.: Scheduling fbr shared \\:indon: joins over n o sharing. data st.reams. In: Proceedillgs of the International Conference on Very Large Data Bases. VLDB (2003) 17. Mamn1a.d: h1.A.. Ghanem: T.h.1.: Aref. \\;.G.: Elmagarmid, A.K., hlokbel, h1.F.: Efficient. pipelined execuReferences tion of sliding-\\:indo\\i queries over data strean~s.Tech. Rep. T R CSD-03-035, Purdue University Department of 1. Abadi: D., Ahmad! Y.. Balakrishnan: H., Ba.lazinska. hl., Comput.er Sciences (2003) Cetintemel, U.: Cherniack, h l , H\\rang, J.H., Ja.nott.i. J.! 18. Hamnlad, hl.A., h,Iokbel: h1.F.. Ali. h1.H.. Aref. \\;.G.; Lindner, W.. hladden: S.: Rasin. A.. St.onebraker, h.1.: Catlin. A.C.: Ellnagarmid. A.K.: Eltabakh. h1.: Elfek!:.
500
i<l<,LI ,500
:Dull
'LOO
20111,
I:un.bc,
,c*
19. 20.
21.
22.
23.
24.
h1.G.: G l ~ a u e m T.M., G\vadera: R . , Ilyas. 1.F.: hlarzouk. 35. , hl., Xiong, X.: Nile: A Query Processing Engine for D a t a Streams (Demo). In: Proceedings of t h e lnterna.t.ioi~al Conferelice on Data Engineering, I C D E (2004) http: /~~~~~~v.fcc.gov/9ll/enl~anced/: 36. Hu, .. Xu, .I., Lee, D.L.: A Generic Frame~vorkfor hlonit~oring'~ontinuous Spatial Queries over hloving object,^. In: Proceedings of t h e AChl International Confere~ice on hlanagement of Data., SIGMOD. Baltimore: h l D (2005 I~verks, G.S.: Samet, H., Smith, K.: Continuous - 37. Nearest Neighbor Queries for Cont.inuousl~-hlo\.ing Points with Updates. In: Proceedings of t h e Int,eriiatio~~al ConPerence on Very Large Dat.a. Bases, VLDB (2003) 38. .Jensen, C.S., Lin, D . , Ooi, B.C.: Query aiid U p d a t e Efficient B+-Tree Based Indexing of hlo\:ing Objects. In: Proceedings oP t h e International Conference on Very Large Data Bases, VLDB (2004) 39. K\von. D . . Lee! S.. Lee; S.: Indexing t h e Current. Posit.ions of Moving 0bject.s Using the Lazy U p d a t e R-tree. 111: Proceedings 01 t.he International Confere~lceon hlobile Data. h~lanagenlent, h,lDh,l (2002) Lazaridis. I.: Porkae\\l: K . , hiehrotra: S.: Dynamic .40.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
Queries over hslobile Objects. In: Proceedings of t h e InterDatabase Techiiolog.. national Conference o n - ~ x t e n d i n g 41. E D B T (2002) Lee. hl.l,.. Hsu. 1V.. Jensen. C.S.. Teo. I<.I,.: S u u u o r t i ~ ~ n Frequent u p d a t e s 61 R - ~ r & s :~ ' ~ o t t ' o m -- U P . ~;;roaclL 111: ~ r o c e e d i n g sof t h e International ConPerence on Very 42. Large Data Bases. V L D B (2003) hladden. S.. Shah. hl.. Hellerstein. J . h l . . Rarnan. V.: C o n t . i i ~ u o u sadaptive ~ o n t i n u o n squeries 'over streams. l~~ 111:P r o c e e d i n ~ s t h e ACM Internat~ional of Conference 011 43. h,lanagement,;P D a t a , SIGh4OD (2002) h.lokbel, h1.F.. Aref, 1V.G.: GPAC: Generic a n d Progressive Processing of hlobile Queries over &.lobileD a t a . 111: P r o c e e d i n ~ sof t h e Internat,ional Conference on hlobile 44. D a t a PlanYagelnent. h/IDhl 2005 hIokbel.hI.F.,Aref,W.G.. am rusch.S.E.l'rabl1akaP S.: To\\.ards ~ c a l a . b l e ~oca.tion-awareServices: Requirements and Research Issues. In: Proceedings of t h e AChl Symposium on Advances in Geographic Information Sys- 45. tems: ACh4 G I s (2003) h,lokbel: h1.F.: Xiong, X . , Aref, 1V.G.: SlN1-2: Scalable Incremental Processing of Cont~inuousQueries in Spatiotemporal Databases. In: Proceedings of t h e ACI\I Int,ernationa.1 Conference on Manaeement of Data. SIGh.lOD 46. (2004) hllokbel, h,l.F., Xiong: X.: Aref, W.G.! Halnbrusch. S.: Prabhakar, S.. Ha.mma.d,h.1.: PLACE: A Query Processor for Handling Real-time Spatio-temporal D a t a St.reams (Denlo). In: Proceedings of t h e Int,ernational Conference on Very Large Data. Rases, VLDB 2004) hlokbel, R1.F.. Xiong, X.: Hamlna : hl.A.. Aref: 1V.G.: Continuous Query Processing of Spatio-temporal D a t a Streams in PLACE. In: Proceedings oP t h e second workshou on S~atio-Teniuoral Database hlaiianen~ent. u S T D B (2004 ~~ h l o t ~ ~ a nR.. \,iidoin. J . , Arasu, A , . Babcock. B.. Babu. i. S.: ~ a t a r h'l., hlanku, G.S., ~ l s t o d C.: R o s ~ n s t e i n : . ; , , J \Tarma: R . : Query Processing, Approximation. and Resource hlanageinent in a. Da.t,a Stream hlanagenlent. System. In: Proceedings of t h e International Conference on Innovat~ive a t a Syst,ems Research, C I D R (2003) D h.louratidis: K.: Papadias; D . , Ha.djieleft.heriou. hl.: Conceptual Part.itioning: Ail Efficiei~t h:Iet,llod for Cont.inuous Nearest Neighbor I\Ionit~oring.111: Proceedings oP t h e AChl International Conference on h'lanagement of Data. SIGh.IOD. Baltimore: h l D (2005) Nadeem, T.: Dashtinezhad, S.: Liao, C., lftode: L.: Trafficview: A Scalable Traffic hlonitoring System. In: Proceedings of t.lle International Conrerence 011 hlobile Data h,lana.gen~ent: hIDhl: pp. 13-26. Berkeley, C A (2004)
h b
. ,
Patel. J . \ l . . Clleli. Y.. Chakka, V.P.: STRIPES: An Efficient Index for Predicted Trajectories. In: Proceedings al of t h e AChl I n t e r ~ ~ a l i o nConPerence on hlaila.geinent of Data, SIGAIOD (2004) Prabllakar. S.: Xia.. Y.. Kalash~iikov.D.V.. Aref, 1V.G.: Hambrusch. S.E.: Quer)- Indexing and \ielocity Constrained Indexing: Scalable Techniques for Continuous Queries on h,lo\;ing Objects. l E E E Trans. 011 C o ~ n p u t ers 51(10) (2002) Reiii\vald, B.. Pirahesh, H.: SQL Open Heterogeneous Data Access. In: Proceediilgs of t h e AChd I ~ ~ t e r n a t i o n a l Conference OII Rlaiiagel~~ent Dat.a: S l G h l O D (1998) of Rein\vald, B.: Pirahesli, H.: Krisli1~ai~1oort11~~~ G.: Lapis, G . . Tran, B.T., Vora. S.: Heterogei~eousquery processing t.hrough sql table functions. 111: Proceedings of t h e Internat.iona1 Conference OII Data Engineering: I C D E (1999) Sa.lt.enis. S.: Jensen, C.S.. Lerrtenegger. S.T., Lopez, h1.A.: Indexing the Positions of Cont~inuously h;loving Objects. In: Proceedings oP t he ACh4 International Conference on hlanagement of Data. SIGh,IOD (2000) Song, Z.. Roussopoulos. N.: Hasliiiig hloving 0bject.s. In: Proceedings of t h e Iiiternational Conference on hlobile D a t a hlanagemellt. h1Dhl (2001) n o . Y . : Papadias. D . : Slien, Q.: Contin~iousNearest, Neighbor Search. 111: Proceedings of t h e International Coi~ferenceon Very Large Data Bases. VLDB (2002) J Tao, Y.. Papadias, D.: S I I I ~ ..: T h e TPR*-Tree: An Optimized Spatio-teniporal Access hlethod for Predictive Queries. In: Proceedings of t h e International Conference on Very Large Data Bases, VLDB (2003) Tatbul! N.: Cetintemel. U.; Zdonik. S.B.: Cherniack, hl.: Stonebraker, hl.: Load Shedding in a Da.ta Streain hslanager. In: Proceedings of the Internat.iona1 Conference on Very Large Data Bases: V1,DB (2003) Xiong, X.: h'lokbel, h1.F.: Aref. 1V.G.: SEA-CNN: Scalable Processing of Coiitinuous ]<-Nearest Neighbor Queries in Spatio-t,emporal Databases. In: Proceedings of t h e 1nterna.tional Conference 011 Dat.a.Engineering, I C D E (2005) Xiong, X.: h'lokbel, A.I.F.. .Are[: W . G . , H a m b r u s c h ~ S.: Prabhakar, S.: Scalable Spatio-temporal Continuous Query Processing for Location-aware Services. 111: Proceedings of t h e International Conference on Scientific and Statistical Database hlanagement., SSDBh3 (2004) Zhang. J . , Zhu, hl.: Papadias, D.: Tao, Y.: Lee! D.L.: Location-based Spatial Queries. In: Proceedings of t h e AChI Interna.t.iona1 ConPereiice on h,Iana.gement of Data., S1Gh:lOD (2003)
1'

Sole: Scalable On-Line Execution of Continuous Queries On Spatio-Temporal Data Streams

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sole: Scalable On-Line Execution of Continuous Queries On Spatio-Temporal Data Streams

Uploaded by

Copyright:

Available Formats

SOLE: SCALABLE ON-LINE EXECUTION OF CONTINUOUS QUERIES ON SPATIO-TEMPORAL DATA STREAMS Mohamed F. Mokbel Walid G.

CSD TR #05-016 July 2005

vldb manuscript No.

SOLE: Scalable On-Line Execution of Continuous Queries on Spatio-temporal Data Streams

the date of receipt and accrptancc sl~ouldbe inserted later

Similarly, the knn-clause may 11a.ve one of two foi-111s:

4 Single Execution of Continuous Queries in

(d) Moving Query

Fig. 1 Positlvr/Negali>e updates in SOLE

4.1 Uncertainty in Spat,io-tempoi.al Queries

Fig. 2 U~~cert.ai~~ty in spatio-temporal queries.

Ic) Snapshot at tlme T? The query Is moved

Fig. 3 Avoidiilg uncertaint~in SOLE.

5 SOLE: Scalable On-Line Execution of Continuous Queries

(a) Separate query plan and buffer for each query

buffer pool for all queries

Fig. 4 O\'erview of shared executjoll in SOLE.

5.2 Shared hlemory Buffer

Procedure Incoi~~i~~gNeu~Object P ; GridCell CP) (Object Begin

Stream of moving objects [PI

stream of spetiotemporal querlcs (Ql

Fig. 5 Shared join operator in SOLE.

5.3 Shared Spat'io-temporal Join Operator

Procedure UpdateQrlery(query Qold: Begin Q)

cases of updating P's location

m)Actloo taken for each case

Fig. 8 All cases of updating P's 1ocat.ion Procedure St,at,ional.yQuery(queryQ) Begin

End. Fig. 10 Pseudo code for updating a query.

End. Fig. 9 Pseudo code for receiling a new query Q

6 Approximate Query Processing in SOLE

Shared Join Operator

la) All cases of updating query region

b)Action taken for each case

Fig. 1 All cases of updatil~g 1 Q's region.

7.1 Single Execution: Size of the Cache Area

(b) 25% Cache

(c) 50% Cache

(d) Conser\lati\ie Cache

1 0 0 % Cache --(3-5 0 % Cache -a--2 5 9 Cache +

Fig. 16 Pipelined SOLE operators.

Fig. 15 Cache area in SOLE

FROM hIovingObjects hi WHERE hI.t,!;pe = "tl-u.ck" INSIDE AdyArea

(b) Response Time

Fig. 19 Grid Size.

Fig. 1 7 Pipelined operat,ors wit11 SELECT

2000 1500 1000 -

SOLE a s a n O p e r a t o r + SOLE a s a Table F u n c t i o n -A

(b) Table of values

0.04 0.08 0.16 0.32 0.64

Fig. 20 hlaximum Number of Supported Queries.

Fig. 18 Pipelined operators xvith Join.

7.3 Scalable Execution: Grid Size

erator, TIle SQL represelltatioll of the abo\re query is as follows:

7.4 Scalable Execution: SOLE Vs. Non-Shared Executioii

(a) Query area

(b) Cache area

(a) Number of queries

(b) Query size ki Percent or moving queries

Fig. 2 1 Data size in t.he query and cache areas.

Fig. 22 Response t,il-nein SOLE

7.6 Load Shedding: Accuracy in Query Ans\ver

Fig. 25 Perfornlance of Object Load Sheddil~g.

Fig. 2 Ucert.aity in spatio-temporal queries.

Procedure IncoiigNeu~Object P ; GridCell CP) (Object Begin