You are on page 1of 5

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery

The Case Retrieval Strategy Based on Predicting Query Performance

Shixia Ma, Dan Liu, Bo Sun


Computer Science and Technology Department Henan Mechanic and Electric Engineering College Xinxiang, Henan, China e-mail:{ msx123456,liudan1005}@126.com
AbstractThe case retrieval is the core of the case-based reasoning system. The paper applies performance prediction into the case retrieval based on case reasoning system, designs a retrieval strategy based on heterogeneous case base. The paper analyzes the organization fashion of the heterogeneous case, and mainly discusses the dynamic allocation strategy of characteristic items weight and construction method of the case retrieval log. The results of the experiment prove that the method can support users to build the target case and then efficiently enhance the success rate of case retrieval as well as the utilization rate of case base. Keywords- case-based reasoning system; target case; characteristic items weight; case retrival log

case retrieval. The main thought is: after the first case retrieval, according with the dynamic distribution weight and retrieval logs of the characteristics item, adjust the usergenerated target cases, assist the user to rebuild the target case. II. THE PARTITION OF CASE BASE The case base is an important part of case reasoning system. Set up the collection

I.

INTRODUCTION

The method of Case-Based Reasoning (CBR) is a strategy by visiting the past processes and results which solved the similar problems in repository and obtain new knowledge from new case in order to adapt to the current problem. It is an important reasoning method of AI [1] [2] [3]. Case retrieval is the core of CBR system; the retrieval quality concerns the quality of the entire system [4]. The target cases for user-generated query can be mapped to a point in the corresponding space. Based on user selected different eigenvalue of the characteristic item, the target cases mapped to different regions of space, and the similar distance cases are also different. Namely, the output result set of cases are also different. When the target case should not fully reflect the needs of users, the return result set can not fulfill the users either. It makes the users must through several times queries to obtain the required the case result set. This situation not only increases the burden on the system, has also increased the difficulty of the user application system. Predicting Query Performance (PQP) also called Predicting Query Difficulty [5] [6] [7]. The PQP evaluates the good extent of a query return results with retrieval system under the situation of no relevant information. At present, the query performance prediction gets more and more attention, and become one of the key function of retrieval system [8]. The methods of query performance prediction divide into two categories: pre-retrieval and post-retrieval [9]. In this paper, query performance prediction based on the post-retrieval is applied to case-based reasoning system for
978-0-7695-3735-1/09 $25.00 2009 IEEE DOI 10.1109/FSKD.2009.399 395

is on empty finite set constituted by n cases, behalf of a case in the case set C. The structure of case base can be divided into isomorphism case base and heterogeneous case base according to cases characteristic item, the characteristic items weight and the relationship between the set of characteristic items[10] The paper uses complete heterogeneous case base as the organization method of case. In fully heterogeneous case base, the number and the types of cases may be different, and the weight of the same characteristic item in different cases also may be different. Set the collection finite set constituted

C = (c1 , c2 , , cn ) for a nonci (1 i n)

F = ( f1 , f 2 , , f m ) for a non-empty
by m characteristic items,

item of a case. Set the collection non-empty finite set constituted weights, F,
m i =1

fi (1 i m) of collection F represents the characteristic W = ( w1 , w2 , , wm )


by m

for a case

0 wi 1,1 i m ,m is the element in case set


is the case weight corresponding to

wi
i

f i and

=1
ci

. F is the collections of global characteristic items in case can be

base, m is the largest number of characteristics,

w = ( w1 , w2 , , wv ) which regarded as a subset of F. The i


is the collection of characteristic items weights can be regarded as a subset of W. The can be mapped to the point of k dimensional space in

ci = ( f1 , f 2 , , f k )(1 k m) which is a case in case base

accordance with its characteristic items and the eigenvalue of the characteristic items . According to the difference between the characteristic items quantity and type contained by the case in the case base, set up

S = ( s1 , s2 ,..., s p )

p elements, si (1 i p) are collections composed by


s

as a non-empty finite set

having different sizes and whichs content is consisted by the elements in F , i represents a case structure category in case base. From the view of the efficiency of case retrieval, the case organization method based on the heterogeneous case base is dealing with the case through dimensionality reduction. Dimensionality reduction can not only reduce the complexity in the case of calculating the similarity, but also map the target case to a certain case structure category, and only to match with the case having the same case structure type, which eases the reduction of case retrieval brought by the increase of case in case base. From the view of building a target case, the users are not aware of or interested in m characteristic items of the case and they do not entirely select the m characteristic items of the case when they build the target case. The organization method based on the heterogeneous case can not only more accurately describe the needs information of users, but also reduce the difficulty using the system by user. III. THE TARGET CASE RECOMMENDATION BASED ON POST-RETRIEVAL

order to ensure not to get into search for local optimal solution. According to ant colony algorithm, all the elements in the collection CF can be regarded as the global path; the users selecting characteristic items will be collected to compose of the new adding case which is regarded as ants. When the case retrieval is completed and recommends the match results for the user, which is deemed ants pass through the path. When the ants pass through the path, update the weight collection W corresponding to the collection F according to the whole elements in the characteristic item contained by

si .
The weight updating algorithm of characteristic items is as follows: (1) After the case retrieval is complete, extract the characteristic items collection structure category.

si = ( f1 , f 2 ,..., f v ) (1 v m) contained by the case


1 wi = (1 i v ) v for (2) Distribute the initial weights

the all elements in collection i . (3) Update the elements weight in collection F . The updating formula is

A. The Distribution Strategy of Characteristic Items Weight The target case is composed by the characteristic item selected by user and the importance level of characteristic item can the evaluate merits of the target case. Elements in F will be given a certain weight which is the level being used when the case is be retrieved. The higher the level is, the greater the weight is. Therefore, users can be recommended new target case according to the characteristic item selected by the user and the elements weight in F after retrieval. Therefore, the paper designs a weight updating strategy based on the ant colony algorithm which adjusts the elements weights in F according to the characteristic items selected by the user. In 20th century 90's, Italian scholar M. Dorigo, V. Maniezzo, A. Colorni etc. proposed a novel simulated evolutionary algorithm - ant colony algorithm [11] through simulating the ants search path in the natural world. The principle of ant colony algorithm is a pheromone will be left on the way ants search for food. Ants exchange, cooperate with other ants according to the pheromone, and find a shorter path. The more ants transit under a certain path, the greater the intensity of pheromone, ants will tend to choose the direction the pheromones intensity is greater, while the pheromone will automatically volatile in a certain period in

evaporation factor of the characteristic items weight in t to t +1 times, (t ) is the weights of elements in F at the time of t times case retrieval, is remaining pheromone after the case matches successfully one time, in order to match the success of the case after the first left the pheromone, = (1 ) w ' is the elements which have not been distributed weights in CF , and w ' = 0 , otherwise

wi = (t + 1) = i (t ) + , 1 i v .In the formula , is a parameter , 0 < < 1 . 1 represents the

w' =

1 v.

B. Case Retrieval Log Case retrieval logs store the producing target case and their similar case in form of collection after case is retrieved every time, and then all cases in the space ,which is corresponding to the case structure categories mapped by the target case, are generated alignment in reversing order . Every case in the space maintains a list that a case is belongs to after it is retrieved. The more the list is, the more times the case is output. The result set that the case will recommend after it is retrieved follows the definitions as follows. Definition 1: the similarity between the recommended cases and the target cases must be larger than prescriptive threshold sim '' . When the case is retrieved, the target case, according to the characteristic items contained by it, corresponds to any one element

si = ( f1 , f 2 ,..., f v ) (1 v m) in collection

396

S .And then in accordance with the eigenvalue of characteristic items contained by the target case; the target case is mapped to the point of v dimensional space. After the match between the target case and all cases in this space is complete, the collection of similar cases will be output Therefore, after the case is retrieved every time, the target case and the similar cases collection can build a region which regards the target case as the center, and distance from the center to the border is dis = 1 sim ' . Definition 2: After the case is retrieved, the region built by the target case and the similar cases collection can be expressed as area = (case, caselist , s ) .And case

1 is mapped to the same space in recommended different regions. The eigenvalues of the characteristic items
1 is the eigenvalues of the contained by characteristic items contained by the case whichs label is

targetcase ''

targetcase ''

smallest in collection

cei '

having been sorted and contained

targetcase ''1 . in corresponding space of


Method 2: Extract the characteristic item i which has largest weight and is not contained in the target case the collection

represents the target case having been stored, caselist represents the similar cases list having been output, s represents the cases structure category the region belongs to. Definition 3: The inverted structure of the case is represents the case, arealist represents the region list corresponding to the case, area represents the region, count represents the times the region occurred. When the case is retrieved, the following rules must be followed. Definition 4: If the similarity between the recommended cases and the target cases is larger than prescriptive

targetcase ' that is generated through user retrieval from


F
,and re-build a new target

targetcase ''2 using of fi and the characteristic item case targetcase ' . The eigenvalues of the contained by
2 is the characteristic items contained by eigenvalues of the characteristic items contained by the case

ca = (c, arealist ) .

targetcase ''

si ' having been sorted targetcase ''2 . If and contained in corresponding space of
whichs label is smallest in collection quantity of characteristic the

threshold sim '' , the two cases can be regarded as the same case. In the case base, the distances between the cases having with the same case structure category are larger than dis = 1 sim '' . If the similarity between the target case and case is larger than sim '' , we can consider the user performs retrieval in the same region, count will be added 1

user retrieval, extract the collection F of characteristic items

targetcase ' is same as the quantity of elements in targetcase ''2 can not be built. collection F , targetcase ' generated by Method 3: In the target case targetcase '

items

contained

by

, remove the characteristic item having of smallest weight in collection F .The eigenvalues of the
3 is the characteristic items contained by eigenvalues of the characteristic items contained by the case

targetcase ''

after retrieval is complete. Therefore, the cases in the same case structure category can be sorted in descending order according to the quantity of the region contained by the case. If the quantity of the region by the case is same, the cases could be sorted by the sum of count corresponding to the region. The case collection contained by the corresponding space of this cases structure category after sorted can be expressed as

i whichs label is smallest in collection having been sorted and contained in corresponding space

ce '

of

si ' = (c1 , c2 ,..., cv ) (1 v < n) .

contained by built.

targetcase ''3 . If the quantity of the characteristic items targetcase ' is 1, targetcase ''3 can not be
IV. CASE RETRIEVAL STRATEGY

C. Recommendation Ways of Target Case The target case recommended for the user after retrieved, according to the situation of the characteristic items and the corresponding eigenvalues of the characteristic items, can be extracted by the following three methods. Method 1: The recommended target case target case targetcase ' generated through user retrieval. But there are different corresponding eigenvalues between

In the process of case retrieval, the following definitions should be also fulfilled Definition 5: If the similarity between the target cases and all the elements in the sub-case collection contained by the corresponding space exceeds the prescribed threshold sim '' , it is not stored. By definition 2, in order to guarantee the case has representative in the space and to reduce space storage density, the target case, the similarity between it and cases in space is larger, will be not stored. The case retrieval process is as follows:

targetcase ''1 has the same case structure category with the targetcase ''1 with targetcase ' . It means that the

397

1 Extract the characteristic items contained in the new generation target case targetcase ,and maps targetcase into the corresponding space i . 2Match targetcase with all cases in the space and calculate the distances

disi , dis j , , disn

between it and targetcase is larger than sim '' , then update the count value of area corresponding to the cases inverted structure ca . 4 Determine whether the recommendation is successfully according to the information of user feedback, if

with sim ' , sort the 3 Compare cases which meet the definition 1 by the size of the similarity and output them, and update the weights of all the elements in collection F . If the space exists in the case, the similarity

disi , dis j , , disn

regarded as the adjustment factor of the characteristic items weight to affect the speed of the change of the characteristic items weight. The recommendation success rate of first time recommendation depends on the cases quantity and quality in the case base, repetitious retrieval not only takes up systems resources but also affect the users experiences. Providing users with the target case can assist users to reconstruct the target case. The experiment only records the second retrieval, through the user's secondary retrieval, which will effectively improve the recommendation systems overall recommendation success rate and increase the utilization rate of cases in the case base.
User input the implement of collecting characteristic item the implement of generating case

the implement of collecting characteristic item

the target case User rebuilding target case Computer of case similarity

i j n successful, compare with sim '' to targetcase determine whether all cases in and the space meet the definition 5, if met, store them. Then update the inverted structure ca of cases in the corresponding space of targetcase according to the new region generated by targetcase , the retrieval process ends; If not met, not stored. If the recommendation is failing, then implement step 5. 5 In accordance with the method extracting the recommended target case, recommend target case

dis , dis , , dis

The Collection of characteristic item Case library the implement of extracting case The case Collection of sound case Filter of case similarity updating overall characteristic item weight and corresponding cases reversing structure Store case No Success? Yes Store? The Collection of similar case

targetcase ''1

2 and 3 for users 6 In accordance with the recommended target case

targetcase ''

targetcase ''

producing recommendations target case No

targetcase ''1 , targetcase ''2 and targetcase ''3 , users rebuild the target casethen back to step 2.The progress of case retrieval can be shown in Figure 1. V. EXPERIMENTAL RESULTS AND ANALYSIS Develop simulation experiment to the algorithm, XML is regarded as the storage medium of the case collections, the collection size of global characteristic items is 12, increases 5000 cases manually. The case structure category is 30. Experimental number is 600 times. The value of the parameter sim ' is 0.35, the value of the parameter 0.95, and the value of the parameter is 0.98.
TABLE I. THE EXPERIMENT RESULT OF RETRIEVAL retrieval categories One time retrieval Second time retrieval success Times
314 122

Updating corresponding cases reversing structure

Case retrieval log overall characteristic item weight

Yes Retrieval end Not Storing

sim '' is

Figure 1. The Flow Figure of Case retrieval

VI.

CONCLUSIONS

failure times
286 164

Success rate of recommend


52.3% 42.7%

and quality; sim '' not only can affect the quality of the storage case, but also can affect the case retrieval logs; is

sim ' determines one time recommendation results quantity

According

to

the

tableI

experiment

results,

The efficiency of case retrieval and the quality of the results are the important factors to judge the merits of intelligent recommendation system. The paper applies predicting query performance based on post-retrieval into the case retrieval based on the case reasoning system, in accordance with the user feedback information after initial case retrieval to support users to reconstruct the target case. The experiment proves that the method has good results in improving the utilization of case and the recall rate after case retrieval. The next step of this research will apply the query performance prediction to maintenance of the case base.

398

ACKNOWLEDGMENT This work was supported in part by The Natural Science Research Project of Henan Province Education Department (2007 520008, 2007 520009). REFERENCES
[1] Schafer,J.B.,Konstan,J.A.,and Riedl,J.Recommender Systems in EConference.In ACM Conference on Electronic Commerce(EC99).1999. R T Mclvor,P K Humphreys. A case-based reasoning approach to the make or buy decision[J].Integrated Manufacturing Systems,2000;11(5):295-310. Leake D B. CBR in Context: the present and future[A].Case-Based Reasoning, Experiences, Lessons&Future Directions[C].Menlo Park CA,USA:AAAI Press/the MIT Press,1996.1-30. Luo Zhongliang Wang Keyun Kang Renke Guo Dongming, Study on a Case Retrieval Algorithm in Case-based Reasoning System [J] Computer Engineering and Applications, 2005, 25:230-232. LANG Hao, WANG Bin,LI Jin-Tao,and DING Fan Predicting Query Performance for Text Retrieval [J], Journal of Software, 2008, 192:291-300.

[2]

[3]

[4]

[5]

Zhou Y, Croft WB. Ranking robustness: A novel framework to predict query performance. In: Proc. of the 15th ACM Intl Conf. on Information and Knowledge Management. Arlington: ACM Press, 2006. 567574. [7] Cronen-Townsend S, Zhou Y, Croft WB. Predicting query performance. In: Proc. of the 25th Annual Intl ACM SIGIR Conf. on Research and Development in Information Retrieval. Tampere: ACM Press, 2002. 299306. [8] He B, Ounis I. Inferring query performance using pre-retrieval predictors. In: Apostolico A, Melucci M, eds. String Processing and Information Retrieval, 11th Intl Conf., SPIRE 2004. LNCS 3246, 2004. 4354. [9] Xu JX, Croft WB. Query expansion using local and global document analysis. In: Proc. of the 19th Annual Intl ACM SIGIR Conf. on Research and Development in Information Retrieval. Zrich: ACM Press, 1996. 411. [10] Jia Shi jie, Huang Song qing,Liu Li jun, Constructive strategy of nonisomorphic case set based on pheromone theory of ant colony algorithm [J]. Computer Engineering and Applications, 2008, 44(25):210-211. [11] Zhang Jia-hua,Zhao Dong-dong,Jiang He,and Zhang Jian chao, An Ant Colony Clustering Algorithm Based on Pheromone [J]. Computer Engineering and Applications, 2006, 20(2):157-163.

[6]

399

You might also like