You are on page 1of 6

PREVISO MLTIPLOS PASSOS FRENTE VIA ESTRATGIA HIERRQUICA E SISTEMAS NEBULOSOS ADAPTATIVOS I VETTE L UNA ROSANGELA BALLINI 1 ,

L EARNING

PROCESS 4

Department of System Engineering, School of Electrical and Computer Engineering, University of Campinas CEP 13083-852 Brazil

Department of Economic Theory, Institute of Economics, University of Campinas CEP 13083-857 Brazil

Emails: iluna@cose.fee.unicamp.br, ballini@eco.unicamp.br


Abstract This work presents a top-down approach applied for multi-step ahead time series forecasting. Daily time series dataset for a fty six steps ahead task is used for validating the method. Daily samples are aggregated, obtaining a set of weekly time series. The weekly predictions are calculated, and the estimates are disaggregated using proportionality factors. These factors are computed considering the behavior of each day of the week over time. Data pre-processing and input selection are the steps developed prior to the model adjustment. Models adjusted are based on evolving fuzzy systems, which are updated every time a new input-output pattern is informed. Prediction results are analized through a rolling origin evaluation. Results are also compared to the ones obtained by daily models and a combined top-down/bottom-up approach, as well. Keywords Top-down approach, evolving fuzzy systems, fuzzy rule-based systems, time series forecasting.

Resumo Este trabalho apresenta uma estratgia top-down para a previso de sries temporais mltiplos passos frente. O conjunto utilizado neste trabalho para ns de validao composto por sries temporais dirias que sero previstas at cinqenta e seis passos frente. As sries dirias so agregadas obtendo assim um conjunto de dados em base semanal. Os modelos so ajustados utilizando este conjunto de dados, e os valores previstos so desagregados atravs de fatores de participao de cada dia da semana e para cada srie temporal. O pr-processamento e seleo de variveis de entrada so passos considerados para o ajuste dos modelos de sries temporais, os quais so baseados em bases de regras nebulosas adaptativas/evolutivas, que tm como caracterstica principal a atualizao automtica da sua estrutura e parmetros a medida que novos dados so apresentados. Os resultados de previso so analisados para o conjunto de sries temporais utilizando a tcnica de rolling origin, os quais so comparados aos resultados obtidos atravs do ajuste de modelos em base diria e abordagens combinadas do tipo top-down/bottomup. Palavras-chave sries temporais. Abordagem top-down, sistemas nebulosos evolutivos, sistemas baseados em regras nebulosas, previso de

1 Introduo Different approaches based on computational intelligence have emerged in the last years as an alternative for building effective predictors, due to its non-linear nature that maps the relation among independent and dependent variables. Usually, these models are based on neural networks, fuzzy systems and hybrid approaches, adopting an iterative adjustment of a unique model during a sequence of ofine parameters updates, and considering at each iteration all data available for this task. Nowadays, the evolving fuzzy systems has drawn great attention. The concept of evolving fuzzy systems introduced the idea of gradual self-organization and parameter learning in fuzzy rule-based models. Evolving fuzzy systems uses information streams to continuously adapt the structure and functionality of fuzzy rule-based models. The basic learning mechanisms rely on on-line clustering and parameter estimation using least squares algorithms. On-line clustering is based on the idea of data potential or data density. The evolving mechanism ensures greater generality of the structural changes because rules are able to describe a number of data samples. Evolving fuzzy rulebased models include mechanisms for rule modica-

tion to replace a less informative rule by a more informative one (Angelov, 2002). Overall, evolution guarantees gradual change of the rule base structure inheriting structural information. The idea of parameter adaptation of rules antecedent and consequent are similar in the framework of evolving connectionists systems (Kasabov and Song, 2002) and evolving TakagiSugeno (eTS) models and their variations (Angelov and Filev, 2005), (Luna et al., 2007a). So, the term evolving is used to indicate their capability of designing automatically the input space partition and adaptation to possible changes in the dynamic of the system. These characteristics turn the evolving models into an interesting approach for a large scale prediction task, such as the neural forecasting competition (NN5, 2008), due to a large number of time series with different features, that would require a previous treatment for obtaining accurate models, and that in this case can be directly processed by the own model. In this paper we applied an evolving fuzzy system as part of a top-down approach adopted for obtaining the future 56 daily predictions for the reduced (11 time series) and complete datasets (111 time series) of the Neural Forecasting Competition 2008 (NN5, 2008). The performance of the evolving system is compared

with ones obtained by daily models and a combined top-down/bottom-up approach, as well. 2 The top-down strategy Top-down forecasting (TD) is extremely useful for improving the accuracy of detailed forecasts (Lapide, 2006), due to the compensation of errors and variations that may cancel each other out. Our proposal consists in solving an eight steps ahead task, by the aggregation of the daily samples into its respective weekly values. Therefore, instead of dealing with the daily time series (the bottom level), we will act in the aggregate weekly samples (the top level) following a top-down approach. With the transformation applied for all the daily time series, it is still evident the existence of trend and/or seasonal features for most of the time series, but in a weekly base. Distinct of the approach presented in (Luna et al., 2007a), in this study there is no necessary to work with stationary time series, due to the evolving nature of the model to adjust that will adapt itself according to the variations detected over time and for each time series. Two different top-down approaches are evaluated, where the main difference between them is the disaggregation part. These top-down approaches are called historical and daily models. Both are detailed the follow. 2.1 Historical top-down approach The historical top-down (TD-H) approach (Fig. 1(a)) , weekly time series models are adjusted and used to calculate the values of eight weeks ahead for each time series. The eight predictions are computed using the same forecasting model. The disaggregation of the weekly predictions (Ak ) into the respective seven w daily predictions each (i , i = 1, . . . , 7), is perforyk med using the historical contribution of each day for building the aggregate weekly sample. The coefcient factors based on historical data are obtained as follows: cj = median{ck | k = 1, . . . , Nw } j (1)

by Eq. (1). Finally the daily predictions using the TDH approach are given by: yj = Ak cj k w for j = 1, . . . , 7 and k = 1, . . . , Nw . 2.2 Daily top-down approach Fig. 1-(b) illustrates the daily top-down approach (TD-DM), where daily models (DM) as well as weekly models are necessary. The individual components forecasts given by the daily models are improved by adjusting each using correction factors derived from looking at the aggregated weeklys bottom-up (based on the daily predictions obtained for each week) versus its top-down forecast (given by the weekly models). In this case, the daily predictions are corrected multiplying themselves by the associated weekly prediction and divided by the sum of the seven daily components. Let the output of the daily models be represented by yj for j = 1, . . . , 7 with k representing the week, k w and the output of the weekly models is given by Ak . The weekly aggregate prediction estimated using the 7 daily models is represented by Ak = j=1 yj . Then, k d the nal estimate for each day of the kth week following the TD-DM approach is computed as yj,f inal |k=1,...,7 = yj Ak / Ak k k w d (4) (3)

Section 5 presents results obtained for both TD approaches as well as the ones obtained directly by the daily models. In what follows, we will present the treatment adopted for missing data, the input selection criterion applied for modelling the daily and weekly time series models, and a brief description of the model, as well as its learning algorithm. 3 Pr-processamento e seleo de entradas In order to fulll these periods we used the mean value for each day of the week. That is, we calculated the mean value of each day of the week for each time series and the missing sample is given by missing_sample(j, s) = m(j, s) (5)

for j = 1, . . . , 7, where Nw is the number of weeks over the historical time series and ck is the contribuj tion factor for the jth day over the same historical data (k = 1, . . . , Nw ), which is calculated according to:
7 k ck = y j / j j=1 k k yj = yj /Ak w

(2)

for k = 1, . . . , Nw , and Ak denotes the weekly kth w observed data. In this way, the disaggregation for the TD-H approach was performed considering as disaggregation factors the median contribution of each day calculated over the historical available and determined

where m(j, s) represents the long term mean value of the jth day of the week (1 is Monday, 2 is Tuesday, etc) and s represents the sth daily time series, s = 1, . . . , 111. Even though the solution used here for treating missing data is quite simple, it showed better results during the model adjustment when compared to other approaches like linear correlation between time series and interpolation as well, at least for the time series without long term trends. In general, the inuence of possible errors inserted in the time series by reason of the approach adopted for dealing with missing data is softened during the modeling stage using the TD-H approach, due to the

Daily models Weekly model estimate

w Ak
Disag

y1 k
Monk

y2 k
Tuek

... ... ...

y7 k
Sunk

...
Monk Tuek

... ...

Sunk

y1 k

y2 k

y7 k w d yj,f inal |k=1,...,7 = yj Ak / Ak k k

(a)

Figura 1: Top-down approaches: (a) top-down approach with disaggregation based on historical contribution of each day in a week (TD-H); (b) top-down approach with correction factors based on daily and weekly models (TD-DM). transformation of the problem applied. In the case of zero values, they were no treated. After data-preprocessing, inputs are selected using a non-parametric approach known as the Partial Mutual Information Criterion (PMI) (Sharma, 2000). Advantages applying the PMI criterion for input selection are its capability of detecting all linear and non-linear relations among inputs and outputs variables and its independence of model structure (linear or non-linear). However, good estimations of those functions are quite dependent on the amount of data so that the more data available, the better the estimations will be. More details can be found in (Abarbanel and Kennel, 1993), (Sharma, 2000), (Luna et al., 2006). In this paper the PMI criterion was evaluated considering the rst 8 lags of the weekly/daily time series as possible input variables, and taking into account only the in-sample dataset also used for model adjustment. The nal set of inputs selected via PMI varied from one to eight for all the models adjusted. 4 Evolving fuzzy system The evolving fuzzy system was developed having as a start point, its constructive ofine version named the C-FSM model, which was proposed in (Luna et al., 2007a) at the NN3, where it was used for forecasting the eighteen steps ahead of the reduced data set. The fuzzy model is composed of a set of fuzzy rules, and it is mainly based on the rst order TakagiSugeno (TS) fuzzy system (Takagi and Sugeno, 1985). A general structure of the adaptive fuzzy inference system is illustrated in Fig. 2. where xk = [xk , xk , . . . , xk ] Rp is the input vecp 1 2 tor at instant k, k Z+ ; y k R is the output model, 0 for its correspondent input xk . Input space represented by xk Rp , is partitioned in M sub-regions, and each one of these is represented by a fuzzy rule; k = 0, 1, 2, . . . is the time index. The antecedents of each fuzzy If-Then rule (Ri ) are represented by their respective centers
R1
k y1

Ri

x
k gi

. . .
RM
k yM k gM

Input space partition

Figura 2: A general structure of the A-FSM, composed by M fuzzy rules. ci Rp and covariance matrices Vi |pp . The consequents are represented by local models, with output yi , i = 1, . . . , M dened by linear systems:
k yi = k i T

being k = [1 xk xk . . . xk ]; i = [i0 i1 . . . ip ] 1 2 p is the coefcient vector 1 (p + 1) of the local linear model for the i th rule. Each input pattern will have a membership degree associated with each region of the input space partition, calculated through membership functions gi (xk ) that depend strongly on centers and covariance matrices related to the fuzzy partition. So, the model output y(k) = y k , that represents the predicted va lue for future time instant k is calculated by means of a non-linear weighted averaging between local outputs k k yi and its respective membership degrees gi , that is:
M

y (xk ) = y k =
i=1

k k gi yi

With the aim of adjusting and validating the time series model, the dataset is divided in two parts, the

. . .

(b)
k yi

d Ak

x . . .
k g1

yk

(6)

(7)

training dataset (the in-sample dataset), and the test dataset (the out-of-sample dataset). The learning algorithm used for adjusting the model parameters is based on a sequential approach. Because of that, the number of fuzzy rules codied in the structure varies over time, according to changes or variations identied in the series whereas new information is processed at each time instant. In order to start the adaptive learning, it is necessary to obtain a start point. The rst N 0 input-output patterns of the training dataset is used for this task. The initial number of fuzzy rules (M 0 ) is determined in an unsupervised way using the Substractive Clustering algorithm (SC) (Chiu, 1994). After M 0 is dened, all the other model parameters are initialized, considering the centers found by the SC algorithm, and the conditions established by the fuzzy theory for the membership functions. After the model initialization, prediction results may be calculated for the future k = N 0 + 1, . . . patterns. The model parameters and structure are updated considering the error achieved for the next time instant and a recursive Expectation Maximization algorithm (EM), detailed in (Luna et al., 2007b). As mentioned in (Jacobs et al., 1991), an interesting feature of the EM algorithm, is that it does not only reduces the quadratic error measure, but also its variance, with few iterations when compared to other optimization algorithms. Adding and pruning operators that are sensitive to these changes are applied in parallel so that, when something different not represented by the actual model occurs, a new rule is generated. On the other hand, when some rule is no more necessary (something occurred in the past and does not occur any more), then this rule is pruned and eliminated from the model structure. Hence, the learning algorithm performs the model parameters and structure simultaneously. This characteristic let the model deal with trends and structural breaks that may be present in time series. In this paper, the adaptive learning was applied from k = N 0 , . . . , N , where N is the size of the training dataset. Algorithm 1 summarizes the learning process. Before the model adjustment, input-output patterns are normalized so that all the components belong to the interval [0, 1]. Threshold parameters used during the initialization stage and the adaptive learning were selected via trial and error, keeping the same set of values for all the weekly models, and another one for the daily ones. Daily and weekly models were modied during the adaptive learning that was stopped after every sample of the training dataset was processed. After that, the models are kept xed for each forecasting origin considered, and the predictions over the test dataset are calculated.

Algorithm 1 Learning process 1. Build the N input-output samples using all historical data available; 2. Initialization: 1. Determine initial number of fuzzy rules M 0 , over a reduced data set (the rst N 0 inputoutput samples, in general N 0 << N ) using the SC algorithm; 2. Dene initial values for model parameters (spreads, centers, local models) for i = 1, . . . , M 0 ; 3. For k = N 0 + 1, . . . N 1. Estimate the next step ahead output and calculate the error achieved; 2. Update model parameters and structure using the recursive EM algorithm and pruning/adding operators dened in (Luna et al., 2007b); 4. Calculate the h future values required.

Results

Two metrics were evaluated for the three approaches: the symmetric mean absolute percentage error (SMAPE) and the mean absolute error (MAE). Errors achieved by the weekly and daily models were calculated following a rolling origin procedure, with the longest lead time being equal to h = 8 and h = 56, respectively, where h denotes the length of the test period. While the forecasting origin is updated, the models are recalibrated. Hence, after predictions are given for each forecasting origin, the model processes the new input-output pattern, and updates itself making use of its sequential learning algorithm, which performs a unique step for its recalibration, modifying its parameters and structure, if necessary. Table 1 shows the errors obtained by the weekly model for eight different origins over the test period of the complete dataset. The rst column shows the relative forecasting origin, with N representing the size of the training dataset and N + k the respective forecasting origin already at the test dataset. The second column shows the lead time for each origin. The third and fourth column shows the SMAPE and MAE obtained at each origin, respectively. The last two lines show the average errors obtained over all the test dataset with the rolling origin evaluation, as well as the errors obtained over the in-sample dataset. Comparing both errors -training average error and average over all the rolling origins- we can observe that the results obtained are consistent since there is no evidence of overtting and the inuence of a xed forecasting origin was also eliminated. Table 2 presents results achieved by the two TD

Tabela 1: Weekly forecast errors for the complete dataset


Time origin (N+k) 1 2 3 4 5 6 7 8 Total averages Total training averages Number of different forecasts 8 7 6 5 4 3 2 1 36 Weekly Forecast errors SMAPE MAE 12.03 15.58 12.51 16.13 12.47 15.95 12.56 15.89 13.18 16.89 11.27 15.11 9.21 11.89 8.30 10.66 11.44 14.76 11.01 14.33

approaches, showing also the results obtained by the daily models (DM), in an independent way. These daily forecast errors were calculated considering the last 56 realizations of the complete dataset as being the testing period. Because of the long lead time required (56), Table 2 presents erros obtained for some of the forecasting origins evaluated during de rolling origin procedure (origins 1, 8, 15, 22, 29, 43 and 50). These origins were chosen so that it could be possible to compare results obtained by the TD approaches with the ones obtained just by the daily models (DM). The rst column in Table 2 represents the eight different forecasting origin considered for comparison with the three approaches. The second column indicates the number of different forecasts estimated for each origin. The third and fourth columns present the SMAPE and MAE obtained by the DM over the complete dataset. The next columns show the SMAPE and MAE metrics achieved by the TD-H and TD-DM approaches, respectively. Table 2 also presents the total average erros for the three approaches over the testing period, as well as the average training errors obtained by the daily models over the in-sample dataset (SMAPE= 19.73% and MAE = 3.22). We can note from Table 1 and Table 2 that the adaptive model achieved a better performance adjusting the weekly time series than adjusting the daily ones, obtaining an total average SMAPE of 11.44% (see Table 1) against the 21.38% of the daily models. The better performance of the weekly models is also reected in the out-of sample dataset, with the TD-H approach obtaining a mean SMAPE of 18.51% against the 19.45% of the TD-DM and the 21.38% achieved by the DM. Nevertheless, the combined TDDM approach improved in mean the results obtained by the DM, showing that this kind of hierarchical approach can be an interesting alternative to increase forecast accuracy of independent bottom level models, like the DM in this paper. In the same way, we can also observe that the TDH approach showed in mean a slightly better performance and with less computational effort when compared to the other approaches.

Referncias Abarbanel, H. and Kennel, M. (1993). Local false nearest neighbors and dynamical dimensions from observed chaotic data, Physical Review E 47: 30573068. Angelov, P. and Filev, D. (2005). Simpl_eTS: A Simplied Method for Learning Evolving Takagi-Sugeno Fuzzy models, Proceedings of The IEEE International Conference on Fuzzy Systems, pp. 10681073. Angelov, P. P. (2002). Evolving Rule-Based Models: A Tool for Design of Flexible Adaptive Systems, Springer-Verlag, Heidelberg, Germany. Chiu, S. (1994). A cluster estimation method with extension to fuzzy model identication, Proceedings of The IEEE International Conference on Fuzzy Systems, Vol. 2, pp. 12401245. Jacobs, R., Jordan, M., Nowlan, S. and Hinton, G. (1991). Adaptive Mixture of Local Experts, Neural Computation 3(1): 7987. Kasabov, N. K. and Song, Q. (2002). DENFIS: Dynamic Evolving Neural-Fuzzy Inference Systems and Its Application for Time-Series Prediction, IEEE Transactions on Fuzzy Systems 10(2): 144 154. Lapide, L. (2006). Top-down & bottom-up forecasting in S&OP, Journal of Business Forecasting 25(2): 1416. Luna, I., Ballini, R. and Soares, S. (2006). Tcnica de identicao de modelos lineares e nolineares de sries temporais, Sba: Controle & Automao Sociedade Brasileira de Automatica 17: 245256. Luna, I., Soares, S. and Ballini, R. (2007a). A Constructive-Fuzzy System Modeling for Time Series Forecasting, Proceedings of The International Joint Conference on Neural Networks.

Tabela 2: Daily forecast errors for the complete dataset


Time origin (N+k) 1 8 15 22 29 36 43 50 Total averages Total training averages Number of forecasts 56 49 42 35 28 21 14 7 252 Daily models (DM) SMAPE MAE 22.54 3.93 22.43 3.88 22.41 3.85 23.33 3.95 23.49 3.99 21.05 3.76 20.81 3.59 15.01 2.73 21.38 3.87 19.73 3.22 Forecast errors TD-H SMAPE MAE 18.08 2.95 17.78 2.89 17.91 2.91 18.83 3.03 19.31 3.23 18.35 2.98 18.70 2.96 19.12 3.14 18.51 3.01 TD-DM SMAPE MAE 18.47 3.03 18.14 2.96 18.32 3.00 19.63 3.17 19.76 3.31 19.43 3.11 22.49 3.67 19.36 3.19 19.45 3.18

Luna, I., Soares, S. and Ballini, R. (2007b). An Adaptive Hybrid Model for Monthly Streamow Forecasting, Proceedings of The IEEE International Conference on Fuzzy Systems, pp. 16. NN5 (2008). Neural forecasting competition. URL: 1 Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 A strategy for system predictor identication, Journal of Hydrology (239): 232239. Takagi, T. and Sugeno, M. (1985). Fuzzy Identication of Systems and Its Applications to Modeling and Control, IEEE Transactions on Systems, Man and Cybernetics (1): 116132.

You might also like