You are on page 1of 6

FUZZ-IEEE 2009, Korea, August 20-24, 2009

Detection of Preference Shift Timing using Time-Series Clustering


Fuyuko Ito, Tomoyuki Hiroyasu, Mitsunori Miki and Hisatake Yokouchi
clustered and the timing of the preference shift is determined from changes in the features of obtained clusters. The outline of this paper is as follows. The next section describes the user preference model in the proposed method. In section III, the preference shift is dened and we discuss ways of detecting the timing of the preference shift. In section IV, we discuss an experiment performed to investigate which cluster features can be used to determine the preference shift timing with some articial data. Finally, we present our conclusions in section V. II. U SER S P REFERENCE M ODEL FOR P RODUCT R ECOMMENDATION A. Preference Model in Conventional Recommendation Methods In general, recommendation methods are classied into the following three types[1]: collaborative ltering[7], [19], content-based ltering, and hybrid approaches. In contentbased ltering, it is necessary to build a model of a users prole based on the preference information acquired from his/her purchase history. First, a target product is represented as a vector consisting of a number of features. There are several approaches to model a users preference in the feature space, i.e., detecting preferred regions in the feature space and representing suitability according to the users preference as the tness function. In the former approach, preferred regions are detected based on the users preferred products and the suitability of a given product according to this preference is predicted by the similarity between that product and another preferred product. On the other hand, the selection of products to present is optimized by maximization of tness. The input of the tness function of the preference is a product and the output is the tness of the product for the preference. However, the tness function as the preference model is not known a priori. Therefore, some methods optimize product presentation by predicting the tness function interactively based on the users preferences[2], [10], [20]. In this study, the preference in the e-commerce site was dened as the tendency toward the users ideal product. Hence, the preferred regions in the product feature space were identied interactively based on the products preferred by the user. B. Detection of Users Preference in Feature Space using Clustering In this study, the users preference is dened as a set of preferred products, which are dened as those clicked by the user as a metric of user interest. Meanwhile, a product is described as a feature vector of the feature space. For

Abstract Recommendation methods help online users to purchase products more easily by presenting products that are likely to match their preferences. In these methods, user proles are constructed according to past activities on the site. When a user accesses an e-commerce site, the user preferences may change during the course of web shopping. We called this a preference shift in this paper. However, conventional recommendation methods suppose that user proles are static, and therefore these methods cannot follow the preference shift. Here, a novel product recommendation method is proposed, which responds to the preference shift. With use of this recommendation method, the users remain at the site longer than before. This paper discusses the detection method for nding the preference shift timing using time-series clustering. In the proposed method, the products preferred by a user are clustered and the preference shift timing is detected as the change in the clustering results.

I. I NTRODUCTION There is increasing demand for e-commerce sites because they allow larger numbers of products to be presented than physical stores, and provide vendors with increased sales opportunities and greater choice for consumers. Recommendation methods used in these services extract a users prole based on the users activity and present information that suits the obtained prole. For example, Amazon.com1 has attempted to increase sales opportunities by presenting products that are likely to be purchased by the user based on their purchase history. However, the users preference may change during shopping on the web, a situation that we refer to here as a preference shift. Conventional recommendation methods cannot follow the preference shift because these methods assume that user proles are static. Moreover, increasing the time a user spends on the e-commerce site can increase sales opportunities. To lead users to remain at a site longer than before, it is necessary to update the users preferences constantly and be able to induce a preference shift by presenting certain products because a users purchase can be changed by visual priming of the e-commerce site[9], [11], [12]. In this paper, a method to detect the timing of the preference shift using time-series clustering is discussed. In the proposed method, the products preferred by a user are
Fuyuko Ito is with the Graduate School of Enigineering, Doshisha University, Kyoto, Japan and she is the research fellow of the Japan society for the promotion of science. e-mail: fuyuko@mikilab.doshisha.ac.jp Tomoyuki Hiroyasu and Hisatake Yokouchi are with the Department of Life and Medical Sciences, Doshisha University. Mitsunori Miki is with the Department of Science and Engineering, Doshisha University. 1 http://amazon.com/

978-1-4244-3597-5/09/$25.00 2009 IEEE

1585

FUZZ-IEEE 2009, Korea, August 20-24, 2009

Fig. 2.

Detect regions corresponding to users preference by clustering.

example, when clothes are the target products, a product is represented as a combination of the values of various features, such as color, material, sleeve length, etc. Nevertheless, the users preference may have multiple tendencies (see Fig. 1). For this reason, a set of preferred products is clustered in the feature space and multiple tendencies of the users preferences can be obtained. The capability of this method has already been conrmed in a subjective experiment in which application of clustering to the preferred products was shown to be able to acquire the multiple preferences of the user[5], [6]. In that experiment, the multiple preferred regions were specied in the feature space from the clustering result and the products included in the specied regions were presented (see Fig. 2). The results veried that the multiple preferences can be obtained appropriately based on clustering of the preferred products and the products that are presented from the specied regions suited the users preference. III. U SER S P REFERENCE S HIFT ON P RODUCT R ECOMMENDATION A. Time-Series Clustering The preference shift on product recommendation is dened as the change in a users tendency regarding the ideal product during web shopping. For example, a user may be looking for a dress to wear to a party on an e-commerce site. She may initially begin looking for a black dress, but notice vividly colored dresses while shopping. She may then begin to search for dresses that are pink, orange, green, blue, etc. If a dress is represented as a vector consisting of two elements, color and price, all dresses clicked by the user can be mapped to the feature space as shown in Fig. 3. As mentioned above, the preference shift is represented as the change in clustering result with clicking on a product. Therefore, the clustering result of clicked dresses varies as the search advances (see Fig. 3). However, the following items must be considered when clustering is applied to the time-series data.

Fig. 3.

An example of the preference shift.

How to select the data for clustering How to suppress drastic changes in the clustering result

Fig. 1.

Each region corresponds to each preference.

The phrase time-series clustering in this paper means applying clustering to the data per unit of time as shown in Fig. 3 and differs from the concept of the clustering of timeseries data such as waves[17]. One of the simplest methods of time-series clustering is the application of clustering to all stored preferred data as the user clicks a product. However, if a set of data is stored for a long time, the clustering result may not be changed, although small amounts of new data with different characteristics from most of the stored data may be added. Therefore, it is necessary to select data for clustering. Here, the sliding window technique was used and a certain amount of the newest data is selected as the sample data of the window. Moreover, when clustering is applied to the stored data

1586

FUZZ-IEEE 2009, Korea, August 20-24, 2009

independently as the user clicks a product, it is possible that the cluster structure obtained before clicking maybe changed dramatically. Hence, the constraints of past clustering results are added to clustering of the current data in the proposed method to avoid drastic changes in the clustering result. B. Detection of Preference Shift Timing When the data of the preferred products are stored and all stored data are clustered, the clustering result is compared with the former to nd the preference shift timing in the proposed method. Nevertheless, it is not known which feature of the cluster we should compare to detect the preference shift timing. For this reason, the features of the cluster are discussed to judge when the clustering result has changed in this paper. IV. D ISCUSSION OF C LUSTER F EATURES FOR D ETECTION OF THE P REFERENCE S HIFT T IMING A. Experimental Overview In this experiment, clustering was applied to the incremental time-series data of the preferred products and the features of the cluster were examined to determine the preference shift timing. The experimental data, the clustering algorithm, the method for identication of the relevance between two states of the same cluster, and the features of the cluster were as described below. 1) Experimental Data: The feature space of the data is a two-dimensional space and datum x is described as x = (x0 , x1 ) when x0 (0 x0 16) and x1 (0 x1 16) are real numbers. Three test data including 24 data (1 t 24) are generated by an agent implementing the following three preferences. Each of the following preferences represents a possible model of the preference shift on an e-commerce site.

Fig. 4. The agent generates data on the regions dened as a users preference.

The preferred regions were set as region (1) and region (2) in the rst two thirds of the search, and eight data were generated randomly in each region (see Fig. 4(c)). In the remaining third, the preferred region was set as region (3) and the last eight data were generated randomly in region (3). Therefore, the preference shift timing of this test data was t = 17.

Test data (1): Preference shift of a single preference As shown in Fig. 4(a), the preferred region was set as region (1) and region (2) in the rst and second halves of the search, respectively. First, twelve data were generated randomly and uniformly in region (1) and then an additional twelve data were generated in region (2) in the same way. Therefore, the preference shift timing of these test data was t = 13. Test data (2): Preference shift of one of two preferences In the rst two thirds of the search, the preferred regions were set as regions (1) and (2) (see Fig. 4(b)). In the remaining third, regions (2) and (3) were set as the preferred regions. Thus, region (2) was preferred by the agent for the whole search. First, the agent generated eight and four data in regions (1) and (2), respectively. The order of generation in these regions was randomized. Then, four and eight data were generated in regions (2) and (3), respectively. The preference shift timing of these test data was t = 14 because the rst data generated in region (3) appeared at t = 14. Test data (3): Simultaneous preference shifts of two preferences toward a new preference

Test data (4): Without preference shift The agent randomly generated 24 data in region (1) throughout the whole search. In this test data, the preference did not change (see Fig. 4(d)).

2) Clustering Algorithm: The algorithm to detect communities in a network, as proposed by Newman[14], was employed and extended to handle the weighted network in this experiment. This method is a hierarchical clustering algorithm and can obtain an optimal division of nodes in a network with a high density of within-cluster edges and a lower density of between-cluster edges by maximizing quality function modularity Q. Therefore, this method can automatically determine the number of clusters. Here, a k k symmetric matrix e whose element eij is the number of all edges that link nodes in cluster Ai to nodes in cluster Aj is dened. Then, ai = j eij is calculated. Therefore, eii indicates the number of edges that link nodes in cluster Ai to nodes in the same cluster and ai describes the number of all edges emerging from nodes in cluster Ai . Q is designed to emphasize the connection within a cluster and diminish the connection between clusters as shown in the following equation.

1587

FUZZ-IEEE 2009, Korea, August 20-24, 2009

Fig. 5. dc (t) is the distance between the centroids of two clusters. dS (t) is the difference between the spaces of two clusters.

Q=

(eii a2 i)

(1)

In the proposed method, the clustering method mentioned above is applied to the weighted network whose weight is the degree of relevance between each of two products, whereas the relevance between two nodes is described as the existence of an edge in Newmans original method. Therefore, the degree of relevance between two data, xi and xj , is dened as the inverse of the distance between them in the feature space, as shown in the following equation. Similarity (xi , xj ) = 1 Distance(xi , xj ) (2)

Fig. 6. (1).

Transitions of sum of dc (t), sum of dS (t) and C (t) of test data

4) Features of Clusters: The following features of the cluster A(t) are discussed to nd the preference shift timing in this experiment. dc (t), dS (t) and C (t) are features of the cluster A(t) in transition from t 1 to t. The concepts of dc (t) and dS (t) are shown in Fig. 5.

Meanwhile, the latest n data are utilized as samples of a window for clustering. In this experiment, n, the number of sample data for a window, was set to 9. The constraint of past clustering result was not added in this experiment. 3) Identication of the Relevance between Two States of the Same Cluster: To verify the time-series variation of a certain cluster, it is necessary to identify which cluster Aj (t0 + t) at t = t0 + t is most relevant to the cluster Ai (t0 ) at t = t0 . In this study, the similarity between two clusters was computed by the auto-correlation function[16] as shown below. Moreover, |Ai (t0 ) Aj (t0 + t)| is the number of data in common between Ai (t0 ) and Aj (t0 +t), and |Ai (t0 )Aj (t0 +t)| is the number of nodes in the union of Ai (t0 ) and Aj (t0 + t). C Aij (t0 + t) is computed for all pairs of two clusters at t = t0 and t = t0 + t, and each pair is dened as the same cluster in decreasing order of similarity. Here, t is set as t = 1. C Aij (t0 + t) |Ai (t0 ) Aj (t0 + t)| |Ai (t0 ) Aj (t0 + t)| (3)

dc (t): Distance between the centroids of A(t 1) and A(t) dS (t): Difference between the spaces occupied by the data of A(t 1) and A(t) C (t): Similarity of data between A(t 1) and A(t)

B. Experimental Results and Discussion First, the time-series variation of each feature of clusters of the test data (1), representing a preference shift of a single preference, is discussed. Transitions of the sum of dc (t), sum of dS (t), and C (t) are shown in Fig. 6. The horizontal axes in Fig. 6 describe the time t. Meanwhile, clustering is applied from t = 9 because the number of sample data for the window is nine. Figure 6(a) shows that dc (t) and dS (t) increased rapidly at t = 13. The preference shift timing of test data (1) was set at t = 13. Therefore, the variations of dc (t) and dS (t) may indicate the change in clustering result. On the other hand, dc (t) and dS (t) were also increased at t = 20 because the data that suit the preference in early steps disappeared

1588

FUZZ-IEEE 2009, Korea, August 20-24, 2009

Fig. 7.

Transitions of sum of dc (t) and dS (t) of test data (2).

Fig. 8.

Transitions of sum of dc (t) and dS (t) of the test data (3).

from the window. For this reason, the number of sample data for a window should be discussed further in future studies. Moreover, it is difcult to determine the preference shift appropriately with C (t) because the data included in each cluster change rapidly (see Fig. 6(b)). Second, we discuss the variation in the sum of dc (t) and sum of dS (t) of test data (2) as shown in Fig. 7. In the test data, dc (t) and dS (t) increased simultaneously at t = 14 when the preference shifted. However, these increments were small in comparison with the increments at t = 17. Two data in region (2) were very close to each other in a cluster at t = 13 (see Fig. 4), and a datum in region (3) was added to this cluster at t = 14 due to the preference shift. However, the centroid of the cluster moved over slightly and the space covered by the data of the cluster was approximately the same as before because these three data were close to each other. This result indicated that the distribution or the covariance of the data in each cluster must be considered in future studies. Next, the time-series variation of each feature of clusters of test data (3) is shown in Fig. 8. The test data represent simultaneous preference shifts of two preferences toward a single preference. dc (t) and dS (t) increased rapidly at t = 17 when the preference shift timing of this test data was set. In the same way as the test data (1), it is possible to detect the preference shift timing based on the time-series variation of dc (t) and dS (t). However, the number of sample data for a window and the constraint of the past clustering result must be considered because the last datum that suits the initial preference is merged into other clusters, and dc (t) and dS (t) increased at t = 20 and t = 24. Finally, the time-series variation of each feature of clusters of test data (4) is shown in Fig. 9. The test data showed consistent preference with no preference shift, and it must be conrmed whether dc (t) and dS (t) increased or not. Figure 9 shows that v were consistent in comparison with Figs. 6, 7, and 8. Overall, it is possible to detect the preference shift timing according to the rapid increases in distance between the centroids dc (t) and the difference between the spaces dS (t). Moreover, these features would not change when a

Fig. 9.

Transitions of sum of dc (t) and dS (t) of test data (4).

preference shift does not occur. Nevertheless, the distribution or the covariance of data within a cluster must be discussed further in future studies. V. C ONCLUSIONS The purpose of this study was to increase sales opportunities by detection of the preference shift on e-commerce sites and its triggers. In this paper, a method that applies time-series clustering on preferred products was proposed to detect the preference shift timing. The features of the cluster were also discussed using three sets of articial test data to determine when the clustering result had changed. As an experimental result, the preference shift timing could be detected according to the time-series variations of distance between the centroids and the difference between the spaces of two states of the same cluster. In future studies, the number of sample data for a window and application of constraints of past clustering results should be discussed. Eventually, the capability of the proposed method to detect the preference shift timing of actual users should also be assessed in subjective experiments. R EFERENCES
[1] Adomavicius, G., Tuzhilin, E.: Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.6, pp.734749 (2005)

1589

FUZZ-IEEE 2009, Korea, August 20-24, 2009


[2] Aoki, K., Takagi, H.: Interactive ga-based design support system for lighting design in 3-d computer graphics. The transactions of the Institute of Electronics, Information and Communication Engineers, Vol.81, No.7, pp.16011608 (1998) [3] Fukui, K., Saito, K., Kimura, M., Numao, M.: Visualizing Dynamics of the Hot Topics Using Sequence-Based Self-Organizing Maps. Lecture Notes in Articial Intelligence, Vol.3684, pp.745751 (2005) [4] Fukui, K., Saito, K., Kimura, M., Numao, M.: Compilation to Visualize the Dynamic Clusters by the Adapted Self-Organizing Network. Transactions of the Japanese Society for Articial Intelligence, Vol.23, No.5, pp.319329 (2008) [5] Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H.: Discussion of Offspring Generation Method for Interactive Genetic Algorithms with Consideration of Multimodal Preference. Simulated Evolution and Learning, Lecture Notes in Computer Science, Springer, Vol.5361, pp.349359 (2008) [6] Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H.: Offspring Generation Method for interactive Genetic Algorithm considering Multimodal Preference. Transactions of the Japanese Society for Articial Intelligence, Vol.24, No.1, pp.127135 (2009) [7] Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: Grouplens: applying collaborative ltering to usenet news. Communications of the ACM, Vol.40, No.3, pp.7787 (1997) [8] Kleinberg, J., Karypis, G., Konstan, J., Reidl, J.: Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.91101 (2002) [9] Koufaris, M., Kambil, A., LaBarbera, P.A.: Consumer Behavior in Web-Based Commerce: An Empirical Study. International Journal of Electronic Commerce, Vol.6, No.2, pp.115138 (2002) [10] Llor` a, X., Sastry, K., Goldberg, D.E., Gupta, A., Lakshmi, L.: Combating user fatigue in igas: partial ordering, support vector machines, and synthetic tness. In: Proceedings of Genetic Evolutionary Computation Conference, pp.13631370 (2005) [11] Mandel, N., Johnson, E.J.: When Web Pages Inuence Choice: Effects of Visual Primes on Experts and Novices. Journal of Consumer Research, Vol.29, No.2, pp.235245 (2002) [12] Mandel, N, Nowlis, S.M.: The Effect of Making a Prediction about the Outcome of a Consumption Experience on the Enjoyment of That Experience. Journal of Consumer Research, Vol.35, No.1, pp.920 (2008) [13] Newman, M.E.J., Girvan, J.: Finding and evaluating community structure in networks. Physics Review E, Vol.69, Issue 2, 026113 (2004) [14] Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physics Review E, Vol.69, Issue 6, 066133 (2004) [15] Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature, Vol.435, No.7043, pp.814818 (2005) [16] Palla, G., Barabasi, A.L., Vicsek, T.: Quantifying social group evolution. Nature, Vol.446, No.7136, pp.664667 (2007) [17] Ratanamahatana, C.A. and Keogh, E.: Making Time-series Classication More Accurate Using Learned Constraints. In: Proceedings of SIAM International Conference on Data Mining (SDM 04), Lake Buena Vista, Florida, pp.1122 (2004) [18] Sakaki, T., Matsuo, Y., Ishizuka, M.: Topic Extraction from Scientic Paper Database. In: Proceedings of the Annual Conference on the Japanese Society for Articial Intelligence, Vol.20, 1A1-1 (2006) [19] Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative ltering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web, pp.285295 (2001) [20] Takagi, H.: Interactive evolutionary computation: Fusion of the capabilities of ec optimization and human evaluation. In: Proceedings of the IEEE, Vol.89, No.9, pp.12751296 (2001)

1590

You might also like