Professional Documents
Culture Documents
COX1
COX1 tity increases the availability of the service. For example,
PGG2 the quantity of Mucus is increased by (Stomach Cell ser-
energy produce PG (NodeAgent keeps
PGI2 track of parameter vice:produce mucus) in Figure 2. Consequently, the Mucus
pub/sub in Alg-2 &
Stomach Cell
Fatty Acids
Alg-3 as described service becomes more available on the wall of the stom-
produce in Section 3.2)
mucus Arachidonic Acid ach, which becomes better protected from erosion and ulcer
Omega-3 caused by gastric juice that is present there. Thus by defi-
Mucus
cover
Mucus
nition, (Stomach Cell service:produce mucus) promotes Mucus
stomach wall Extension service.
Legends
Web service Inhibition Similarly, when operation op1 of service sa con-
Web Service of ontology Input of ontology node type
Function/ node type meeting pre-condition sumes an entity (i.e., input parameter) that in turn pro-
behavior
Function/
Ontology node
Function having output of ontology
node type and post-condition
vides service sb , we say that sa : op1 inhibits sb . Fig-
behavior
ure 2 shows an example of inhibition between Aspirin ser-
Figure 2. Web Service Models vice:block COX1 and COX1 service.
For promotion, inhibition and indirect recognition, we
identify three types of matching between parameters p1 and
a domain. We assemble a hierarchy of indices (middle of p2 , whose data types refer to domain ontology index nodes
Figure 2) to existing domain ontologies (e.g., Enzymes) to na and nb , respectively:
unambiguously categorize the type of operation inputs and
outputs. • Exact match: na = nb
• Is-a: na is a child of nb
2.1.2 Recognition and Composition • Has-a: na has a component nb
Much like molecules in the natural world where they
can recognize each other and form bonds in between [2], We assume that the above relationships among parameter
Web services and operations can also recognize each other types are already declared in domain ontologies and thus
through both syntax and semantics. Consequently, they can can be automatically detected.
compose and bring about potentially interesting behaviors. Composition Validity. Various measures [3] have been pro-
We identify two types of operation recognition: direct and posed to determine whether two operations are composable
indirect recognition, and two types of service recognition: at both syntactic and semantic levels. These measures can
promotion and inhibition. be used to determine whether a direct recognition-based
composition is actually valid. For promotion and inhibition-
Direct Recognition. A direct recognition is established be-
based compositions, they are valid because the entities of
tween operations opa and opb , if opa consumes an operation
interest provide the corresponding services by declaration.
interface opintf , which is implemented by opb . In addition,
In this section, we focus on how the validity of an indirect
opa and opb must be mode, binding and message compos-
recognition-based composition can be determined in the
able [3].
verification phase. We denote comp(OPs , opt ) as an oper-
Indirect Recognition. A target operation opt indirectly rec-
ation composition involving a set of source operations OPs
ognizes a source operation ops , if ops generates some or all
providing input parameters to target operation opt , where
input parameters of opt . An example of indirect recogni-
OPs ⊂ OPs (→ opt ). In order for comp(OPs , opt ) to be
tion is shown in Figure 2 between operation produce mucus
valid, the following must be true:
from the Stomach Cell service and polymorphic operation
produce PG from the service of a type of enzyme called ∀ops ∈ OPs , Γ[ops .L, opt .L] = 0 (2)
COX1. We use the term indirect to indicate the fact that In Eq. 2, Γ is a domain expert-determined correlation func-
there is a potential need to relay parts of the output message tion that measures the relevancy (i.e., 1 for the same and
from ops to parts of the input message to opt at the compo- non-zero for related) of two locales. Eq. (2) states that in
sition level. A bond is established between ops and opt for order for the composition to be valid, each of the source op-
each input parameter opt can receive from ops . We denote erations must have a locale that correlates to that of the tar-
the set of bonds between ops and opt as B(ops → opt ). If get operation. A relevant bioinformatic example would be
we refer to the set of all operations that opt recognizes as to make drug molecules effective (or compose with disease
establish a service composition lead between the subscriber scription propagates up a composition tree (lines 02 to 09 in
and the publisher. subscribe(pin )) and down an inheritance tree (lines 10 to 17
Parameter Level Filtering. We distinguish two types of in subscribe(pin )) in the ontology hierarchy. Note that to help
mining: Fixed scope mining and incremental mining. In reduce overhead, lines 04-06 and 12-14 in both publish(pout )
fixed scope mining, the parameter level filtering is triggered and subscribe(pin ) instantiate a node agent only when the
after all the Web services in the focused library are intro- node is referenced by at least one parameter.
duced (lines 18 - 37 in Figure 3 Alg-1). Fixed scope mining
3.2.1 Performance Analysis
can be used when the mining context is clearly defined and
the search space can be easily determined. In incremental We compare the computation complexity of the screening
mining, instead of identifying OPintf (C) before introducing algorithms against a naive exhaustive search algorithm us-
Web services into the mining process, OPintf (C) grows as ing operation recognition at both the operation and param-
Web service operations are identified and introduced. The eter levels. Table 1 lists relevant variables used in our com-
incremental mining is more flexible than the fixed scope plexity analysis.
mining, since it does not require a predefined mining con-
text. While it may involve a more diverse range of Web Table 1. Symbols and Parameters
Variables
services and thus take longer during the screen phase, in- Nop Number of operation interfaces in the mining context
cremental mining offers a greater potential of discovering Npin Average # of input parameters to an operation
Npout Average # of output parameters from an operation
more interesting compositions than the fixed scope mining. Nws Number of Web services in the focused library
Nsi Average # of operation interfaces each Web service implements
Function generateLeads() generates a lead tree rooted at Noc Average # of operation interfaces each operation consumes
operation opintf listing as its child nodes operations whose |Ont| Size of domain ontologies
output parameters match its input parameters. Performance measurement parameters
Top Time for operation filtering
Fig. 3 Alg-2 and Alg-3 show the algorithms used by Tmp Time for message/parameter filtering
T Total screening time (T = Top + Tmp )
an ontology index node agent to register the publication
of an output parameter (publish(pout )) or the subscription of
Table 2. Performance Comparison
an input parameter (subscribe(pin )). Within the ontology in- Our Screening Algorithm
dex hierarchy as shown on the right of Figure 2, publica- Top = O[Nop + Nws (min(Nsi logNop (1 + Noc ), Nop logNsi (1 + Noc ))]
Tmp = O[Nop (Npin + Npout )log(|Ont|)]
tion and subscription on a node can sometimes propagate Exhaustive Search
2 min(N N logN , N 2 logN )]
Top = O[Nws
to other nodes. This happens when the node is involved si oc
2 N 2 min(N
Tmp = O[Nws
si si oc
pin logNpout , Npout logNpin )]
si
in an inheritance or compositional relationship with other
nodes. In general, publication propagates down a compo- Figure 3 Alg-1 (a) assumes that Nop > Nsi . If Nop
sition tree (lines 02 to 09 in publish(pout )) and up an in- falls well under Nsi , then for improving the performance,
heritance tree (lines 10 to 17 in publish(pout )), while sub- we can easily change Alg-1 (a) to iterate through opera-
2500
(a) Number of Bound Operations
0.5
0.45
1500
0.4
1000
0.35
500
0 0.3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Number of Operations Number of Operations
9 0.8
|Ont| = 5000 |Ont| = 5000
|Ont| = 10000 |Ont| = 10000
8 |Ont| = 20000 |Ont| = 20000
|Ont| = 50000 0.7 |Ont| = 50000
7
0.6
(c) Number of Interesting Compositions
5
0.4
4
0.3
3
0.2
2
1 0.1
0 0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Number of Operations Number of Operations
tion interfaces in OPintf (C) and check if they are imple- through indirect recognition since they require more com-
mented by s.Operations. If we refer to the size of col- putation according to Eqs. (5), (6) and (7).
lection s.Operations as |C|, then the time to carry out a
hashtable-based check of the ∈ operation is O[log(|C|)]. For each domain operation, we generate its input/output
Table 2 shows the performance difference between the al- parameters such that the number of these parameters uni-
gorithms used in our screening phase for a fixed scope min- formly falls in the range of 0 to 5. Each of these parameters
ing and a traditional exhaustive search algorithm. Note we is associated with a Domain Ontology Index Node (DOIN),
choose fixed scope mining in the comparison since it yields which is identified with a sequence number. For simplic-
a performance that corresponds to the upper bound of that ity, we flatten all the DOINs (i.e., no inheritance and com-
of incremental mining, given the same number of Web ser- position relationships among ontology nodes) so that only
vices involved in the mining. Table 2 shows that when Nop exact matches will be considered. We place these DOINs
is relatively small and stable as compared to Nws , T in our in a circular buffer so that the last sequence number is next
filtering algorithm is linear to Nws , while T in a traditional to the first one. To simulate the cohesive nature of DOINs
exhaustive search is exponential to Nws . in a domain, we pick them for the domain using a Gaus-
sian distribution around a mean sequence number randomly
4 Simulation Results chosen for the domain according to a uniform distribution.
We study the effects of variables listed in Table 3 on dis- We assume that each parameter has an equal chance of be-
covery output variables including the number of completely ing associated with a DOIN. To simulate the pre- and post-
bound operations, the number of interesting compositions, conditions, each parameter is symbolically given a range
and the average values of their interestingness. We focus randomly chosen between 0 and 1 using a uniform distribu-
on the study of interestingness of compositions obtained tion. We use the overlap of two such ranges (see Eq. (4))
Table 3. Experiment Settings to calculate the contribution of these conditions towards the
Variable Value or Range similarity of two operations. When calculating interesting-
ε0 0.1 ness, we chose to assign a bigger weight to surprisingness
Number of domains 50
Expected max # domains in comp. 5 (cs ) than novelty (cn ) due to cs ’s higher sensitivity towards
Operation interfaces per domain 5 − 100 the increase of the total number of operations. Finally, we
Input parameters per operation 0−5
Output parameters per operation 0−5 use an interestingness threshold of 0.5 to determine whether
Pre/Post-condition range (float) 0−1 a composition is interesting. This threshold can be changed
Domain ontology index nodes |Ont| 5000, 10000, 20000, 50000
cc /cp /cn /cs 0.4/0.6/0.4/0.6 as needed, to suit what is an acceptable interestingness.