Better prediction with restricted training set (European Main Stock Markets)
ARIMA stands for AutoRegressive Integrated Moving Average.
To better understand your time series data and to predict future points in the series (forecasting), after cleansing your data (function tsclean from R package “forecast”), you can fit your time series data to an ARIMA model to prepare prediction. As there is lot of ARIMA models defined by p, d and q parameters and as people who needs forecast are not ARIMA models specialist, an auto.arima function is available in R package “forecast”. auto.arima will try different ARIMA models and select the “best” fit arima model. “Best” here mean “Best for a vast majority of situations not for all situations”. So for some few specific situations, an ARIMA specialist is needed. As ARIMA (and then auto.arima) is autoprojective which uses the most recent data to compute essentially a weighted average of past values, why we need to select a restricted training set (Methods, including my own method, for extracting “restricted data” are compared in my previous post) ? It is said that a picture speaks a thousand words, some examples are the best advocate for restricted training set. In the pictures, - Test set is shaded in black color - Traing set is shaded in blue color - Data set has a serial combination of grey, blue and black colors - Forecast results is shaded in red color - Forecast prediction interval is limited by orange color Following are one year forecasting trend for European Stock indexes : DAX, CAC and FTSE. As you can see one year (mid 1997 to mid 1998) forecast based on autoarima model is better with RESTRICTED training set (left figure) than with full data set. Data used are EuStockMarkets from (R package “forecast”).