You are on page 1of 8

A Web Service Mining Framework

George Zheng† and Athman Bouguettaya†‡



Virginia Tech, Blacksburg, Virginia, USA

CSIRO ICT Centre, Canberra, ACT, Australia
{gzheng,athman}@vt.edu

Abstract tivities; 3. develop measures that can be used to objectively


evaluate the interestingness of the mining results; and 4.
We propose a service mining framework for exploring in- demonstrate the usefulness of the framework through a mo-
teresting compositions of existing Web services. The frame- tivating use case.
work first screens Web services for composition leads using
A key characteristics distinguishing Web service min-
a “coarse-grained” filtering approach. It then verifies these
ing from traditional Web service composition approaches
leads based on runtime conditions. Top candidates are se-
as governed by standards such as WSFL, XLANG,
lected from the verified leads and evaluated for their inter-
BPEL4WS, DAML-S and OWL-S is that Web service min-
estingness. We present algorithms to automate the screen-
ing is driven by the desire to find any unanticipated and
ing phase of the framework. Finally, we study the effects of
interesting compositions of existing Web services. Tradi-
key variables on lead compositions’ interestingness. As a
tional composition approaches are usually driven by a top
motivating example, we apply these algorithms to the field
down strategy, which first requires a user to provide a goal
of biological pathway discovery and rely on knowledge ob-
containing a fixed set of specific criteria. It then uses these
tained from reverse engineering online resources to assess
criteria to search for matching component Web services.
their effectiveness.
Since the goal provided by the user already implies what
type of compositions the user anticipates, the evaluation of
composition interestingness is not a major concern in these
1 Introduction approaches. In the absence of such top-down query, Web
service mining techniques need to address how interesting-
The Web is poised to transition from a data Web to a ness of service compositions can be determined. The lack of
service Web where Web applications, aka, Web services, specific goals in Web service mining also lends itself natu-
would be the first-class objects. As the Web service tech- rally to being carried out using the bottom-up strategy. The
nology continues to mature, it is expected that there will be simplest approach following this strategy would be an ex-
a large number of Web services deployed to the Web. The haustive search for composability between all Web services.
increase in their availability is also expected to lead to the This approach does not scale well since it would inevitably
natural next step in the evolution of Web services, spurring result in a “combinatorial explosion” problem when faced
both the need and opportunities to break new ground on Web with a large number of Web services. As we look for ef-
service mining, much like the easy access to a glut of data ficient techniques, similarities between Web services and
that has provided a fertile ground for data mining research. molecules offer some interesting insights. Web services can
We define Web service mining as a search process aiming be thought of in many ways as similar to molecules in the
at the discovery of unanticipated and interesting composi- natural world. Like a molecule, a Web service has both at-
tions of Web services. We believe that Web service mining tributes and dynamic behaviors. Like a molecule formed
would be key to leveraging the large investments in applica- from constituent atoms and/or simpler molecules, a com-
tions that have so far operated as non-interoperable silos. In posite service is composed of component services. Under
anticipation of the need and opportunities to mine Web ser- the right conditions, certain molecules can recognize each
vices, our research focuses on developing a framework that other and form bonds in between. The concept of recogni-
would facilitate the related mining activities. Our objectives tion can be easily extended to the Web service world to help
include: 1. identify the type of activities involved in the devise mining techniques on Web services. The similarity
mining process; 2. develop effective strategies to stream- between molecules and Web services also motivated us to
line and efficient algorithms to automate much of these ac- apply our mining framework back to the field of bioinfor-

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
Interesting
matics. As a result, the discovery of biological pathways Start Compositions/
eights Pathways of
Assign W
came across as a natural application of our mining frame- Web Services
Evaluation
work. The idea is to model biological entities as individual
Web services [4] and use the mining framework to help dis- Invocation
Plans
cover linkages across isolated lab findings as captured in
these models. The mining of these linkages (i.e., pathways) Conditions gn Verification
through the vehicle of Web service models is expected to in
i
Locale M
complement and, when enough details are captured in the pu Lead
m Compositions
models, present an inexpensive and accessible alternative ott
oB
to existing in vitro and/or in vivo exploratory mechanisms. Domain
Ontologies Screening
The discovery of these pathways is expected to deepen our Focused
understanding of how diseases come about and help expe- Search Space Library
Scope Mining
Specification Context Determination
dite drug discovery. Although our experiments are limited
in scope, we have confirmed the effectiveness of our screen- Service
Service
ing algorithms in identifying potential pathway leads. The Consumers
Consumers
Find
verification and evaluation of these leads are still ongoing Service
Registry Publis Service
work. h Service
Providers
Providers
We organize the remainder of the paper as follows. In
Section 2, we present our mining framework and introduce Figure 1. Web Service Mining Framework
several concepts used as its basis. In Section 3, we de-
scribe in detail the generation of the focused library and our verification phase (or the weeding phase) using invocation
screening algorithms. In Section 4, we use simulation to plans and additional matching characteristics such as run-
study the effects of key variables on the interestingness of time conditions, which require more intensive computa-
lead compositions. In addition, we show how the frame- tions. The evaluation phase (or the harvest phase) is used to
work can be used to mine pathways linking biological Web evaluate the interestingness of initial invocation plans pro-
services provided by entities such as Aspirin. We conclude posed by the verification module, devise modifications to
the paper with discussion of future work in Section 5. the plans, and direct the verification module to verify the
validity of these modified plans.
2 Web Service Mining Framework
Our Web service mining framework, as shown in Figure
2.1 Mining Concepts
1, can be figuratively described using the “sow, grow, weed In this section, we describe the service ontology and
and harvest” analogy. The framework starts with scope several other concepts that serve as the basis of our min-
specification (left of figure) by a domain expert for defining ing framework. These include operation/service recogni-
the context of mining. We expect the domain expert to have tion used in our screening algorithms; operation similarity
a general idea about the “seeds” of Web service functional and interestingness, which are two related concepts used in
areas (e.g., cell enzyme and drug functions) that he/she is the evaluation phase to objectively measure how interesting
interested in mining. Such seeds are expected to grow into a composition really is. We use Figure 2 containing Web
fruitful compositions (e.g., Aspirin pathways) as the min- service models of biological entities to illustrate some of
ing progresses. Weights may be assigned to these seeds these concepts.
to differentiate the user’s interest in them and to stimulate
the growth of compositions encompassing the correspond- 2.1.1 Web Service Ontology
ing functional areas. To curb the problem of combinatorial We rely on OWL-S to define our Web services with a
explosion, the mining context is used in the search space WSDL grounding. We refer to the applicability contained
determination phase for defining a focused library of ex- in the OWL-S service profile as locale in this paper. To
isting Web services as the initial pool for further mining. recognize the fact that certain services (e.g., payment) may
Web services in the focused library are then filtered through be involved in multiple OWL-S categories of services (e.g.,
the screening phase (or the growing phase) used to identify healthcare, travel, legal), we use the concept of domain to
potentially interesting composition leads of Web services. group relevant operations, or more appropriately, operation
This is achieved using a “coarse-grained” ontology-based interfaces, into the same category. An operation interface
filtering mechanism (see Section 3.2), which inspects only specifies a shared functionality implemented by operations
a subset of matching Web service characteristics (i.e., op- from different Web services. A Web service’s involvement
eration signature, message semantics) that can be quickly with a domain is reflected by whether it supplies or con-
processed. The composition leads are then verified in the sumes an implementation of an operation interface in such

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
Web Service
Domain Ontology Indices component inheritance parent OPs (→ opt ), then
Registry
OPs (→ opt ) = {op | B(op → opt ) = ∅} (1)
Node
NSAIDs
Aspirin
Promotion When operation op1 of service sa produces an
Aspirin
Celebrex
NodeAgent entity (i.e., output parameter) that in turn provides service
energy block COX1
sb , we say that sa : op1 promotes sb . In a bioinformatic
block COX2 setting, the increase in quantity of a service providing en-
Enzymes

COX1
COX1 tity increases the availability of the service. For example,
PGG2 the quantity of Mucus is increased by (Stomach Cell ser-
energy produce PG (NodeAgent keeps
PGI2 track of parameter vice:produce mucus) in Figure 2. Consequently, the Mucus
pub/sub in Alg-2 &
Stomach Cell
Fatty Acids
Alg-3 as described service becomes more available on the wall of the stom-
produce in Section 3.2)
mucus Arachidonic Acid ach, which becomes better protected from erosion and ulcer
Omega-3 caused by gastric juice that is present there. Thus by defi-
Mucus
cover
Mucus
nition, (Stomach Cell service:produce mucus) promotes Mucus
stomach wall Extension service.
Legends
Web service Inhibition Similarly, when operation op1 of service sa con-
Web Service of ontology Input of ontology node type
Function/ node type meeting pre-condition sumes an entity (i.e., input parameter) that in turn pro-
behavior
Function/
Ontology node
Function having output of ontology
node type and post-condition
vides service sb , we say that sa : op1 inhibits sb . Fig-
behavior
ure 2 shows an example of inhibition between Aspirin ser-
Figure 2. Web Service Models vice:block COX1 and COX1 service.
For promotion, inhibition and indirect recognition, we
identify three types of matching between parameters p1 and
a domain. We assemble a hierarchy of indices (middle of p2 , whose data types refer to domain ontology index nodes
Figure 2) to existing domain ontologies (e.g., Enzymes) to na and nb , respectively:
unambiguously categorize the type of operation inputs and
outputs. • Exact match: na = nb
• Is-a: na is a child of nb
2.1.2 Recognition and Composition • Has-a: na has a component nb
Much like molecules in the natural world where they
can recognize each other and form bonds in between [2], We assume that the above relationships among parameter
Web services and operations can also recognize each other types are already declared in domain ontologies and thus
through both syntax and semantics. Consequently, they can can be automatically detected.
compose and bring about potentially interesting behaviors. Composition Validity. Various measures [3] have been pro-
We identify two types of operation recognition: direct and posed to determine whether two operations are composable
indirect recognition, and two types of service recognition: at both syntactic and semantic levels. These measures can
promotion and inhibition. be used to determine whether a direct recognition-based
composition is actually valid. For promotion and inhibition-
Direct Recognition. A direct recognition is established be-
based compositions, they are valid because the entities of
tween operations opa and opb , if opa consumes an operation
interest provide the corresponding services by declaration.
interface opintf , which is implemented by opb . In addition,
In this section, we focus on how the validity of an indirect
opa and opb must be mode, binding and message compos-
recognition-based composition can be determined in the
able [3].
verification phase. We denote comp(OPs , opt ) as an oper-
Indirect Recognition. A target operation opt indirectly rec-
ation composition involving a set of source operations OPs
ognizes a source operation ops , if ops generates some or all
providing input parameters to target operation opt , where
input parameters of opt . An example of indirect recogni-
OPs ⊂ OPs (→ opt ). In order for comp(OPs , opt ) to be
tion is shown in Figure 2 between operation produce mucus
valid, the following must be true:
from the Stomach Cell service and polymorphic operation
produce PG from the service of a type of enzyme called ∀ops ∈ OPs , Γ[ops .L, opt .L] = 0 (2)
COX1. We use the term indirect to indicate the fact that In Eq. 2, Γ is a domain expert-determined correlation func-
there is a potential need to relay parts of the output message tion that measures the relevancy (i.e., 1 for the same and
from ops to parts of the input message to opt at the compo- non-zero for related) of two locales. Eq. (2) states that in
sition level. A bond is established between ops and opt for order for the composition to be valid, each of the source op-
each input parameter opt can receive from ops . We denote erations must have a locale that correlates to that of the tar-
the set of bonds between ops and opt as B(ops → opt ). If get operation. A relevant bioinformatic example would be
we refer to the set of all operations that opt recognizes as to make drug molecules effective (or compose with disease

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
causing molecules) by sending them to where the disease 
m
I = A(cn N + cs S) wi (5)
cells are located.
i=1
t
If we use Bselected (ops → opt ) to denote a set of where
opt ’s input parameters (i.e., target parameters) covered by
• A is the actionability of the composition,
a selected subset of B(ops → opt ) for comp(OPs , opt ),
s
Bselected (ops → opt ) the corresponding output parame- • N is the novelty of the composition,
ters from ops , and EAcomp external attributes involved in • S is the surprisingness of the composition,
comp(OPs , opt ), then in order for the composition to be • cn and cs are weights such that 0 ≤ cn , cs ≤ 1 and
valid, the following must also be true: cn + cs = 1.
t
• m is the number of expert-assigned weights wi (wi >
∃F (OPs ) : ∀Bselected (ops → opt ) where ops ∈ OPs ,
X s
1) to operation interfaces and domain ontology index
{(Cpost (f (ops )) on Bselected (ops → opt ) and EAcomp )}
ops ∈OPs
nodes that are involved in a composition. We choose
f ∈F (OPs ) to multiply all such weights involved in a composition
t
⊃ {Cpre (opt ) on Bselected (ops → opt ) and EAcomp } (3) to reflect their subjective interestingness-enhancing ef-
F (OPs ) refers to composition-specific mediations that fect.
need to be applied to OPs so that the combined post- Actionability. We define actionability as a binary (i.e., 1
conditions Cpost of OPs cover the space carved out by the for actionable, 0 for non-actionable) representing whether
s
pre-conditions CP re of opt if all Bselected (ops → opt ) are the composability of a composition can be verified through
replaced by corresponding Bselected (ops → opt ). Conse-
t
simulation or live execution. A non-actionable composition
quently, invocation of opt is activated. is considered uninteresting. Thus actionability contributes
multiplicatively towards the overall interestingness.
2.1.3 Operation Similarity Novelty. Novelty, N , measures how unique and new a com-
The concept of operation similarity is relevant when we position is. We use the following function to calculate nov-
study the interestingness of a indirect recognition-based elty:

composition. The similarity of two operations can be mea- N =
1, promot. or inhibit.
1 − Maxop∈D Sim(comp(OPs , opt ), op), indirect recog.
sured by comparing their input parameter set, output param-
(6)
eter set, pre-conditions and post-conditions. We use the fol-
lowing function to measure the similarity between opi and For both promotion and inhibition, the novelty is set to 1
opj : due to the validity of the composition. For indirect recogni-
|Pin (opi ) ∩ Pin (opj )| |Pout (opi ) ∩ Pout (opj )| tion, D is a reference set of domains. Obviously the more
Sim(opi , opj ) = cp ( × )
|Pin (opi ) ∪ Pin (opj )| |Pout (opi ) ∪ Pout (opj )| similar the composed operation is to an existing operation,
|Cpre (opi ) ∩ Cpre (opj )| |Cpost (opi ) ∩ Cpost (opj )| the less novel it is regarded. In this case, N can vary be-
+ cc ( × ) (4)
|Cpre (opi ) ∪ Cpre (opj )| |Cpost (opi ) ∪ Cpost (opj )| tween 0 and 1. Since D needs to be large enough to ensure
where cp and cc are weights such that 0 ≤ cp , cc ≤ 1 and the uniqueness of the composition, the check of novelty in
cp + cc = 1. |P | and |C| give the size of parameter set the case of indirect recognition for all compositions found
P and condition set C, respectively. According to Eq. (4), in D could be a very expensive task. For this reason, we
Sim(opi , opj ) ranges from 0 to 1, with 1 indicating that the carry it out in the final phase of the mining process, where
two operations have the same parameters and conditions. the number of leads is presumably small.
Surprisingness. Surprisingness S indicates how unexpect-
2.1.4 Interestingness edly a Web service composition is achieved. We use the
following objective function to measure it:
In the context of Web service mining, interestingness indi-  ε
 , promot. or inhibit.
cates how interesting a Web service composition is. For a  minds ∈D(s) Γ[d(op),ds ]
ε×nd
direct recognition-based composition, it is interesting if it S= Nt ×minop∈OPs Γ[d(op),d(opt )] , indirect recog. nd ≤ Nt
exhibits better qualities than all previously discovered sim- 
 ε
,
minop∈OPs Γ[d(op),d(opt )] indirect recog. n > N
d t
ilar operations. For indirect recognition, promotion and
(7)
inhibition-based compositions, their interestingness may be For promotion or inhibition involving op and s, D(s) is a
less certain and subjective knowledge from a domain ex- set of domains s is involved in. For indirect recognition, nd
pert may be needed to help with the determination. Due is the number of different domains involved in the compo-
to the potential possibilities of large number of such com- sition, Nt is the expected maximum number of domains a
positions, we devise the following objective measure of in- composition may have and used for normalization purpose.
terestingness, I, to help reduce the candidate pool for final In all cases, d() gives the domain of a given service opera-
consideration: tion. ε is a small number (0 < ε < 1) used to normalize the

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
value of S. Γ is a correlation function that measures the rel- phase takes advantage of necessary subjective interesting-
evancy of two domains or the cohesion of the same domain ness measures when defining the locale of interest and a list
(when i = j). It is defined for domains di and dj as: of domains to be considered for mining. For example, the
1

Γ[di , dj ] = e ε0 (n+1) (8) locale may be the brain and the domains may include cell
enzyme functions and drug functions. Based on such min-
where n is the number of unique pairs of operations,
ing interest, the scope of the mining, or mining context can
{(opi , opj ) | opi ∈ di , opj ∈ dj }, that are previously
be determined using:
known to have been involved in a composition. When
n = 0, the correlation between two domains in Eq. (8) is C = {d(L) | d ∈ D} (9)
− 1 where
assigned an initial value of ε (ε = e ε0 ). This helps bound
the surprisingness of a service composition in Eq. (7). Eq. • D is a set of Web service domains,
(8) also shows that Γ for two domains quickly approaches 1 • L is a set of locale attributes of mining interest,
as n increases. Obviously, a composition achieved by com- • d(L) is a domain carved out by L.
bining component services from relatively few domains or
Consequently, the set of all operation interfaces included in
domains that are previously known to be very relevant is
C is denoted as OPintf (C):
less surprising than one from a diverse set of domains or
OPintf (C) = {opintf | ∃d ∈ D ∧ opintf ∈ d(L)} (10)
domains that are previously known to be less relevant.
Eq. (7) aims at objectively measuring surprisingness. Since different domains may rely on different ontologies to
However, surprisingness is sometimes subjective, i.e., the describe relevant concepts or constructs within them, the
user evaluating it may choose to use subjective measures. specification of the mining context essentially determines a
The reference base of such measures may be personal set of domain ontologies (e.g., NSAIDS, Enzymes in Figure
knowledge, belief, bias and needs. Unfortunately, ap- 2) to use for the mining process. Assume R is the Web
proaches based solely on subjective measures tend to in- service registry. The focus library of Web services, F , can
hibit us from getting interesting compositions that were not be calculated using:
thought of. An extreme case of relying on subjective mea- F = {s | s ∈ R ∧ (s.Operations ∩ OPintf (C)
= φ ∨
sures to carry out Web service mining is the traditional com- ∃op ∈ s.Operations : opconsume (OPintf ) ∩ OPintf (C)
= φ)} (11)
position approach where the user issues a query specify-
ing the composition in pursuit to start the search process. 3.2 Screening
A reasonable compromise between a purely objective ap-
proach and a purely subjective approach would be to use To address the combinatorial explosion problem men-
a Bayesian approach to refine the subjective reference base tioned earlier, our screening phase uses a publish/subscribe
and converge it to the reality of composition opportunities. mechanism to convert the traditional combinatorial search
We envision that this involves an iterative process of the fol- problem into a spontaneous operation recognition problem.
lowing steps: This is achieved using two steps: operation level filtering
and parameter level filtering. We list algorithms of both in
• detection of potential presence of a composition,
Figure 3 Alg-1.
• conception of an invocation plan, Operation Level Filtering. At the operation level, operation
• execution of the invocation plan, interfaces within the mining context serve as the medium
• evaluation of execution results, for Web service operations to plug into each other. Figure 3
• modification and re-execution of invocation plans if Alg-1 (a) shows the operation level filtering algorithm. Ser-
necessary. vice operations that implement a particular interface publish
We focus on objective measures in our research but take their implementation through that interface (lines 06 - 09).
advantage of subjective measures at the beginning of our Service operations that need to invoke the implementation
mining process when they might be considered necessary of an interface subscribe to that interface (lines 10 - 15). An
to bootstrap the process. operation agent is created (lines 01-03) for each operation
interface to keep track of references to it from various op-
3 Framework Details/Algorithms erations. When publishing an operation that implements an
interface, function publish(op) of the corresponding agent
In this section, we describe in detail the generation of the checks whether there is any subscriber to the interface. If so,
focused library and our screening algorithms. it tries to establish a service composition lead using direct
recognition between the publisher and the subscriber. Sim-
3.1 Focused Library Generation
ilarly, when an operation subscribes to an interface that it
The mining process starts with a domain expert specify- consumes, function subscribe(op) checks whether there is
ing a scope for the mining activity. The scope specification any publisher that implements the interface. If so, it tries to

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
Alg-1: Operation and Parameter Level Filtering (26) EndIf (12) If (¬∃Agent(inheritanceP arent))
Input: Context operation interfaces OPintf (C), focused library F , ontology (27) EndFor (13) create Agent(inheritanceP arent);
O. (28) For each pin ∈ opintf .messagein (14) EndIf
Output: Leads of composed Web services L. (29) k ← type(pin ); (15) Agent(inheritanceP arent).publish(pout );
Variables: Leads from publication and subscription Lps , operation interfaces (30) If (k ∈ O) (16) EndIf
consumed by op, opconsume (OPintf ). (31) If (¬∃Agent(k)) (17) EndIf
(32) create Agent(k); (18) If (subscribers = φ)
(a). Operation Level Filtering (33) EndIf (19) For each pin ∈ subscribers
(01) For each opintf ∈ OPintf (C) (34) Agent(k).subscribe(pin ); (20) pin .bonds.add(pout );
(02) create Agent(opintf ); (35) EndIf (21) EndFor
(03) EndFor (36) EndFor (22) EndIf
(04) For each s ∈ F (37) EndFor
Alg-3: Node Agent registering subscription of pin
(05) For each op ∈ s.Operations
subscribe(pin )
(06) If (∃opintf ∈ OPintf (C): op impl opintf ) (c). Lead Generation
(38) For each opintf ∈ OPintf (C) Input: Input parameter pin of data type that this agent represents.
(07) Lps ← Agent(opintf ).publish(op); (01) subscribers.add(pin );
(39) If (opintf .isBound())
(08) L.add(Lps ); (02) If (CompositionalP arents = φ)
(09) EndIf (40) Lps ← opintf .generateLeads()); (03) For each n ∈ CompositionalP arents
(10) For each opintf ∈ opconsume (OPintf ) (41) L.add(Lps ); (04) If (n ∈ O ∧ ¬∃Agent(n))
(11) If (opintf ∈ OPintf (C)) (42) EndIf (05) create Agent(n);
(43) EndFor (06) EndIf
(12) Lps ← Agent(opintf ).subscribe(op);
(44) return L; (07) Agent(n).subscribe(pin );
(13) L.add(Lps ); (08) EndFor
(14) EndIf Alg-2: Node Agent registering publication of pout (09) EndIf
(15) EndFor publish(pout ) (10) If (inheritanceChildren = φ)
(16) EndFor Input: Output parameter pout of data type that this agent represents. (11) For each n ∈ inheritanceChildren
(17) EndFor (01) publishers.add(pout ); (12) If (n ∈ O ∧ ¬∃Agent(n))
(02) If (CompositionalChildren = φ) (13) create Agent(n);
(b). Parameter Level Filtering (03) For each n ∈ CompositionalChildren (14) EndIf
(18) For each opintf ∈ OPintf (C) (04) If (n ∈ O ∧ ¬∃Agent(n)) (15) Agent(n).subscribe(pin );
(19) For each pout ∈ opintf .messageout (05) create Agent(n); (16) EndFor
(20) k ← type(pout ); (06) EndIf (17) EndIf
(21) If (k ∈ O) (07) Agent(n).publish(pout ); (18) If (publishers = φ)
(22) If (¬∃Agent(k)) (08) EndFor (19) For each pout ∈ publishers
(23) create Agent(k); (09) EndIf (20) pin .bonds.add(pout );
(24) EndIf (10) If (inheritanceP arent = φ) (21) EndFor
(25) Agent(k).publish(pout ); (11) If (inheritanceP arent ∈ O) (22) EndIf

Figure 3. Screening Algorithms

establish a service composition lead between the subscriber scription propagates up a composition tree (lines 02 to 09 in
and the publisher. subscribe(pin )) and down an inheritance tree (lines 10 to 17

Parameter Level Filtering. We distinguish two types of in subscribe(pin )) in the ontology hierarchy. Note that to help
mining: Fixed scope mining and incremental mining. In reduce overhead, lines 04-06 and 12-14 in both publish(pout )
fixed scope mining, the parameter level filtering is triggered and subscribe(pin ) instantiate a node agent only when the
after all the Web services in the focused library are intro- node is referenced by at least one parameter.
duced (lines 18 - 37 in Figure 3 Alg-1). Fixed scope mining
3.2.1 Performance Analysis
can be used when the mining context is clearly defined and
the search space can be easily determined. In incremental We compare the computation complexity of the screening
mining, instead of identifying OPintf (C) before introducing algorithms against a naive exhaustive search algorithm us-
Web services into the mining process, OPintf (C) grows as ing operation recognition at both the operation and param-
Web service operations are identified and introduced. The eter levels. Table 1 lists relevant variables used in our com-
incremental mining is more flexible than the fixed scope plexity analysis.
mining, since it does not require a predefined mining con-
text. While it may involve a more diverse range of Web Table 1. Symbols and Parameters
Variables
services and thus take longer during the screen phase, in- Nop Number of operation interfaces in the mining context
cremental mining offers a greater potential of discovering Npin Average # of input parameters to an operation
Npout Average # of output parameters from an operation
more interesting compositions than the fixed scope mining. Nws Number of Web services in the focused library
Nsi Average # of operation interfaces each Web service implements
Function generateLeads() generates a lead tree rooted at Noc Average # of operation interfaces each operation consumes
operation opintf listing as its child nodes operations whose |Ont| Size of domain ontologies
output parameters match its input parameters. Performance measurement parameters
Top Time for operation filtering
Fig. 3 Alg-2 and Alg-3 show the algorithms used by Tmp Time for message/parameter filtering
T Total screening time (T = Top + Tmp )
an ontology index node agent to register the publication
of an output parameter (publish(pout )) or the subscription of
Table 2. Performance Comparison
an input parameter (subscribe(pin )). Within the ontology in- Our Screening Algorithm
dex hierarchy as shown on the right of Figure 2, publica- Top = O[Nop + Nws (min(Nsi logNop (1 + Noc ), Nop logNsi (1 + Noc ))]
Tmp = O[Nop (Npin + Npout )log(|Ont|)]
tion and subscription on a node can sometimes propagate Exhaustive Search
2 min(N N logN , N 2 logN )]
Top = O[Nws
to other nodes. This happens when the node is involved si oc
2 N 2 min(N
Tmp = O[Nws
si si oc
pin logNpout , Npout logNpin )]
si
in an inheritance or compositional relationship with other
nodes. In general, publication propagates down a compo- Figure 3 Alg-1 (a) assumes that Nop > Nsi . If Nop
sition tree (lines 02 to 09 in publish(pout )) and up an in- falls well under Nsi , then for improving the performance,
heritance tree (lines 10 to 17 in publish(pout )), while sub- we can easily change Alg-1 (a) to iterate through opera-

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
3500 0.6
|Ont| = 5000 |Ont| = 5000
|Ont| = 10000 |Ont| = 10000
|Ont| = 20000 |Ont| = 20000
3000 |Ont| = 50000 |Ont| = 50000
0.55

2500
(a) Number of Bound Operations

0.5

(b) Average Interestingness


2000

0.45

1500

0.4
1000

0.35
500

0 0.3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Number of Operations Number of Operations

9 0.8
|Ont| = 5000 |Ont| = 5000
|Ont| = 10000 |Ont| = 10000
8 |Ont| = 20000 |Ont| = 20000
|Ont| = 50000 0.7 |Ont| = 50000

7
0.6
(c) Number of Interesting Compositions

(d) Percent of Interesting Compositions


6
0.5

5
0.4
4

0.3
3

0.2
2

1 0.1

0 0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Number of Operations Number of Operations

Figure 4. Effects of Key Variables

tion interfaces in OPintf (C) and check if they are imple- through indirect recognition since they require more com-
mented by s.Operations. If we refer to the size of col- putation according to Eqs. (5), (6) and (7).
lection s.Operations as |C|, then the time to carry out a
hashtable-based check of the ∈ operation is O[log(|C|)]. For each domain operation, we generate its input/output
Table 2 shows the performance difference between the al- parameters such that the number of these parameters uni-
gorithms used in our screening phase for a fixed scope min- formly falls in the range of 0 to 5. Each of these parameters
ing and a traditional exhaustive search algorithm. Note we is associated with a Domain Ontology Index Node (DOIN),
choose fixed scope mining in the comparison since it yields which is identified with a sequence number. For simplic-
a performance that corresponds to the upper bound of that ity, we flatten all the DOINs (i.e., no inheritance and com-
of incremental mining, given the same number of Web ser- position relationships among ontology nodes) so that only
vices involved in the mining. Table 2 shows that when Nop exact matches will be considered. We place these DOINs
is relatively small and stable as compared to Nws , T in our in a circular buffer so that the last sequence number is next
filtering algorithm is linear to Nws , while T in a traditional to the first one. To simulate the cohesive nature of DOINs
exhaustive search is exponential to Nws . in a domain, we pick them for the domain using a Gaus-
sian distribution around a mean sequence number randomly
4 Simulation Results chosen for the domain according to a uniform distribution.
We study the effects of variables listed in Table 3 on dis- We assume that each parameter has an equal chance of be-
covery output variables including the number of completely ing associated with a DOIN. To simulate the pre- and post-
bound operations, the number of interesting compositions, conditions, each parameter is symbolically given a range
and the average values of their interestingness. We focus randomly chosen between 0 and 1 using a uniform distribu-
on the study of interestingness of compositions obtained tion. We use the overlap of two such ranges (see Eq. (4))
Table 3. Experiment Settings to calculate the contribution of these conditions towards the
Variable Value or Range similarity of two operations. When calculating interesting-
ε0 0.1 ness, we chose to assign a bigger weight to surprisingness
Number of domains 50
Expected max # domains in comp. 5 (cs ) than novelty (cn ) due to cs ’s higher sensitivity towards
Operation interfaces per domain 5 − 100 the increase of the total number of operations. Finally, we
Input parameters per operation 0−5
Output parameters per operation 0−5 use an interestingness threshold of 0.5 to determine whether
Pre/Post-condition range (float) 0−1 a composition is interesting. This threshold can be changed
Domain ontology index nodes |Ont| 5000, 10000, 20000, 50000
cc /cp /cn /cs 0.4/0.6/0.4/0.6 as needed, to suit what is an acceptable interestingness.

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007
Stomach
Arachidonic Acid on P Cell
Endoplasmic Reticulum
Legends P erode Gastric
stomach Juice
S cover cell P
Stomach stomach O
Entity providing Cell Service wall O
P the Web Service liberate
probability not
AA covered: 1- f(q m)
O S PLA2 S
Service probability covered
f(qm) Gastric
Web Service S Mucus Juice
providing the locale = stomach cell PGI2 Service Service
S operation O P O
O
deplete
produce produce mucus
Input to operation PG mucus P
meeting precondition Mucus
P P O (of quantity q m)
Liberated produce
Arachidonic Acid locale = platelet PGG2 TxA2 TxA2 Blood
P Vessel
O operation
t
S
COX1
Service
P O P
c
e locale = platelet
P f
f
e
COX1 S S O vasoconstriction
output substance e
P fit
id
ne TBXAS1 TxA2
be Service Service
s
Aspirin block
operation postcondition P COX1 inhibited
COX1 contributes to heart attack
Aspirin COX1- and stroke
Service S O P Aspirin
Compound

Figure 5. Discovered Pathways


Figure 4 (a) shows that the average number of bound op- to the damaging vasoconstriction operation of TxA2 and
erations tends to increase as the number of operations in- another pathway from COX1 to the beneficial service pro-
creases. This is because as more operations become avail- vided by Mucus. It is interesting to see that Aspirin can not
able, their parameters have a higher chance of meeting at a only benefit us by blocking the pathway leading to prob-
DOIN. However, such chance tends to decrease as the num- lems such as heart attack and stroke, but also bring some un-
ber of DOINs increases. wanted side effects of stomach ulcer by blocking the path-
Figure 4 (b) shows that the average interestingness de- way leading to mucus production. Due to page limitation,
creases significantly at the beginning as the number of oper- we leave out details on how an invocation plan can be con-
ations increases. It eventually approaches a steady number structed for verification purposes from the root operation in
as the number of operation continues to increase. The num- a pathway lead tree.
ber of DOINs has a dampening effect on the initial rate of
decrease in average interestingness. In fact, as more DOINs 5 Conclusion
become available, operations are less likely to bind with one In this paper, we proposed to use Web service mining to
another (see Figure 4 (a)). However, compositions involv- discover interesting compositions of existing Web services.
ing those that do become bound tend to be more interesting We presented a novel Web service mining framework and
since they involve domains that are less correlated. algorithms that can be used to automatically screen for Web
Figure 4 (c) illustrates the relationship between the num- service compositions. We also presented the concept of in-
ber of interesting compositions and the number of DOINs. terestingness of these compositions and proposed objective
Because of the same dampening effect by the number of measures to evaluate it. Ongoing work includes verifica-
DOINs as illustrated in (b), we see that as the number tion and evaluation of pathway leads. We will also extend
of DOINs increases, not only does the peak move to the the framework to take advantage of a post-mining useful-
right, but the reduction in the number of interesting com- ness measure to help re-adjust the mining process and steer
positions also decreases. While the number of interesting it along a more fruitful course.
compositions declines with some randomness as the num-
ber of operations increases, such randomness disappears if References
we look at how the percentage of interesting compositions
over number of bound operations changes (Figure 4 (d)). [1] Aspirin. http://www3.interscience.wiley.com:8100/legacy/
The smoothing effect observed in (d) is due to the fact that college/boyer/0471661791/cutting edge/aspirin/ aspirin.htm.
the expected increase in the number of bound operation fol- [2] P. Ball. Designing the Molecular World - Chemistry at the
lows similar random changes as manifested in (c). Frontier. Princeton University Press, Princeton, New Jersey,
1994.
In a separate setting, we have conducted preliminary ex- [3] B. Medjahed, A. Bouguettaya, and A. K. Elmagarmid. Com-
periments to assess the effectiveness of our screening algo- posing web services on the semantic web. The VLDB Journal,
rithms. We applied our algorithms to a list of simplified September 2003.
Web services put together based on reverse engineering on- [4] G. Zheng and A. Bouguettaya. Web service modeling for bi-
line resources such as [1] on Aspirin, COX, prostaglandin ological processes. In 3rd International workshop on Biolog-
ical data Management (BIDM), Copenhagen, Denmark, Au-
(PG), etc. and were able to identify potential pathway leads
gust 2005.
shown in Figure 5. Figure 5 shows a pathway from COX1

2007 IEEE International Conference on Web Services (ICWS 2007)


0-7695-2924-0/07 $25.00 © 2007

You might also like