You are on page 1of 6

International Journal of Economy, Management and Social Sciences, 2(9) September 2013, Pages: 746-751

TI Journals

International Journal of Economy, Management and Social Sciences

ISSN
2306-7276

www.tijournals.com

Forecasting Stock Returns using Support Vector Machine and


Decision Tree: a Case Study in Iran Stock Exchange
Mahboobeh Shafiee *1, Hoda Majbouri Yazdi 2, Hamid Panahi 3, Hamid Hesari 4
1,2

Department of Accounting, Mashhad Branch, Islamic Azad University, Mashhad, Iran


Department of Accounting, Bojnourd Branch, Islamic Azad University, Bojnourd, Iran

3,4

AR TIC LE INF O

AB STR AC T

Keywords:

Most researchers conducted on stock returns have investigated the linear relationship between the
independent variables and stock returns, and statistical methods. Today's world is a paradigm shift
from classical modeling and the analyses - based on the basic initial model- to the development of
models directly from raw data. Nowadays, with the advancement of information technology and
entrance of artificial intelligence including support vector machine and decision tree into the field
of scientific research, it has become possible to examine the nonlinear relationships between
variables.

Stock returns
Support vector machine
Decision tree

This study mainly aims to investigate the relationship between independent variables and stock
returns using data mining techniques and it has tried to answer the question of whether a model can
be presented to forecast stock returns using these techniques? The studied population is the
companies listed on Tehran Stock Exchange from 2001 to 2008. Applying the existing data to the
support vector machine, stocks returns is forecasted with accuracy of 92.16, which is better than
the decision tree with 9 degrees of freedom with a probability of almost one hundred percent.
2013 Int. j. econ. manag. soc. sci. All rights reserved for TI Journals.

1.

Introduction

Investment methods are of great variety. Regardless of the type and method of investment, two factors of "the investor's prediction about
the recoverable investment assets" and "the investments real proceeds" are the most important aspects of financial decision making.
In any investment, the investor is seeking for gaining returns from the investment. The investor is trying to collect data from further stock
returns of companies. However, one of the most common methods to analyze financial information is to prepare financial ratios. Financial
ratios are in fact the abstract of financial reports of companies that provide great information content on the companys internal status.
Since the hypothesis of predictability of stock returns was confirmed as a reality in financial management [3], one of the objectives of
accounting information is to help the users predict future input cash flows to the enterprises and consequently predict investment returns.
Some of the variables affecting the stock returns of companies in the stock market are resulted from the financial information provided by
accounting system. The effect of these data is very complex and partly unknown [2].
Development of information technology and creation of the ability to collect and store data at very high volumes in most organizations, will
require the need to develop the theories and tools to help humans in extracting useful information (knowledge), from these rapidly growing
volumes of digital data (Fayyad et al., 1996). Due to the high potential of AI in processing large databases and finding complex nonlinear
patterns in them, many studies were conducted on the use of support vector machine in different fields. Meanwhile, due to the somewhat
turbulent nature of the variables influencing them, financial decisions are suitable context for application of support vector machine. So the
main question of research is whether a model can be presented using support vector machine technique to predict stock returns of the
companies listed on the Tehran Stock Exchange?
Further, the paper outlines and discusses the subject and then research literature is presented. Methodology is further described. Research
findings are presented and finally, conclusions and recommendations are presented.

2.

Research Background

Several studies were conducted on stock returns in various financial areas, some of which are mentioned below:
Kanas and Yannopoulous (2001) examined predictability of stock returns with nonlinear models. This study estimated the nonlinear
relationship between returns and variables such as interest rates, earnings and dividend yields to stock price ratio using a simple logarithmic
model. The results showed that there is a nonlinear relationship between the above mentioned variables and stock returns [7].
* Corresponding author.
Email address: shafieem4@mums.ac.ir

Forecasting Stock Returns using Support Vector Machine and Decision Tree: a Case Study in Iran Stock Exchange

747

Internat ional Jour nal of Economy, Mana ge ment and Social Sciences , 2(9) Se pt ember 2013

Xing (2002) investigated the relationship between capital investment and stock returns in both cross-sectional and time-series data. This
paper used the capital asset pricing model. The results showed that investment is negatively related with future stock returns and future
stock returns are positively related with future investments [18].
Olsen and Mossman (2003) investigated the predictability of stock returns using financial ratios. In this study, neural network model and
ordinary least squares technique were used to predict stock returns. In this study, stock returns were considered the dependent variable and
accounting ratios were considered independent variables. The results showed that the use of neural networks technology for prediction
presents more acceptable results than other technologies and significantly reduces prediction error [13].
Lewellen (2003) investigated the ability of financial ratios to predict stock returns. In this research using capital asset pricing model,
prediction ability of price-earnings ratio, book value-market value ratio and dividend yields were analyzed. The results showed that
dividend yields can significantly predict stock returns, but price-earnings ratio and book value to market value ratio have little ability to
predict stock returns [11].
Chun and Kim (2004) have attempted to predict the rate of return and select portfolio using different data mining and especially, neural
network models. The results showed that the use of neural networks technology for prediction presents more acceptable results than other
technologies and significantly reduces prediction error [4].
Omran and Ragab (2004) examined the linear and nonlinear relationship between returns and financial ratios. In their study, linear and
nonlinear relationship between financial ratios and stock returns was tested using correlation analysis and multivariable regression. The
results of linear model showed that the shareholders equity returns ratio is the only ratio that can be used to predict stock returns. The
results of nonlinear models also show that the shareholders equity returns ratio and assets returns ratio is better than other ratios to predict
stock returns. Overall, nonlinear models describe the returns behavior better than linear models [14].
Karyl et al. (2005) investigated and compared linear models for stock returns prediction (Fama-French Model 1992) and nonlinear models
for stock returns prediction (neural networks and genetic algorithms models). The results show that there is a significant difference between
linear and nonlinear models and the number of variables in them, Overall, nonlinear models are better than linear models [9].
In a study, Lee (2006) investigated the relationship between cash distribution and returns. The results show that this ratio has an important
role in forecasting stock returns, risk-free returns and inflation free returns. Regression analysis proved that there is a direct relationship
between this ratio and stock market returns [10].
Tun et al. (2008) also used a combination of data mining and knowledge discovery methodology to present a model for stock price
prediction. Their model was a combination of Graphs k analysis to represent fluctuations of stock price, the net return to satisfy the
constraints, and an initiative self-organizing neural network for prediction [17].
Atsalakis and Valavanis (2009) studied the stock price using Nero Fuzzy Networks. The results show that the stock price prediction
through this method is more accurate than linear methods [1].
In his study, Ghalibaf Asl (2002) examined the relationship between stock returns of Tehran Stock Exchange and exchange rates from 1996
to 2001. The results of his study showed that changes in exchange rates have a negative effect on stock returns, but the changes in exchange
rates with a lag time have a positive effect on stock returns[6].
In a study, Raie and Chavoshi (2003) predicted the behavior of stock returns of Behshahr Industries Development Company (BIDC) using
multi-factor models and neural network. Independent variables in this study include in the Tehran Stock Exchange Price Index, exchange
rate (dollar) in the free market, oil price and gold price. The results of this study demonstrate the superior performance of neural networks
over multi-factor model [15].
Namazi and Rostami (2006) investigated the relationship between financial ratios (including ratios of liquidity, profitability, activity,
returns and market) and stock returns and realized that there is a significant relationship between all financial ratios and stock returns, but
profitability and liquidity ratios (including current and quick ratios) have a high correlation with stock returns[12].
In a study, Karami et al. (2006) evaluated "linear and nonlinear relationships between financial ratios and stock returns in Tehran Stock
Exchange". The
results
suggest
a
linear
relationship
between
financial
ratios
and
stock
returns
[8].
Dostian (2007) investigated "the relationship between net profit changes and operating cash flow changes and stock returns changes". This
research method is correlation analysis which has used multivariable linear regression method. The results of the study show that only net
profit has little correlation with stock returns [5].
Tehrani and Abbasion (2008) tried to investigate the ability of artificial neural networks in prediction of short term stock price in Tehran
Stock Exchange using technical indicators. Results of their research showed that artificial neural networks have the predictability of change
signs short term stock price in Tehran Stock Exchange both in Bull Market or Bear Market [16].

Mahboobeh Shafiee et al.

748

Internati onal Journal of Ec onomy, Mana ge me nt and Soci al Sc iences , 2( 9) Sept ember 2013

3.

Methodology

This research will be conducted documentary. It means that analysis will be performed using historical data related to past periods.
This study mainly aims to investigate the application and performance of SVM in predicting stock returns. In fact, it will be tried to
determine criterion (independent) variable using multiple predictor (dependent) variables. In this method, the data ranging from 6 to 12 are
classified in 20 classes and the data will be divided into two groups of test and experiment and SVM method will be performed to
determine the effect of independent variables on the dependent variable.

4.

Statistical population and sampling

The study population includes the companies listed on Tehran Stock Exchange which will be removed and included for selection.
The sample to discover the relationship between various factors and motivations and stock returns including all member companies whose
data can be extracted from available sources.
This research was conducted on the companies listed on the Stock Exchange of Iran from 2001 to 2008 with the following criteria:

5.

1)

They have been listed on Tehran Stock Exchange at least from the beginning of the fiscal year 2000.

2)

Sample companies are not among the investment and financial companies (banks).

3)

Sample companies have not encountered a permanent stop from 2000 to 2008.

4)

Sample companies have not changed their financial year from 2000 to 2008.

Dependent and independent variables

To study the effect of independent variables on the dependent variable (stock returns) and due to the unavailability of all necessary data, the
following ratios are selected as independent variables, which include:
Current ratio: division of current assets by current debts
Total debt to shareholders' equity ratio
Financial leverage (division of long-term debts by total shareholders' equity and long term debts)
Total Sales to Total Assets Ratio
Return on net value: division of net profit by shareholders' equity
Net operating cash flow to average total assets ratio
Net operating cash flow to shareholders' equity ratio
Net operating cash flow to operating profit ratio
Total asset turnover
Gross profit to sales (revenue) ratio
Inventory to current assets ratio
Operation cash flow to working capital ratio
Total debt to total assets ratio
Firm size (natural logarithm of book value of total assets)
Current debt to shareholders' equity ratio
Long-term debt to shareholders' equity ratio
Return on working capital (division of net profit by working capital)
Fixed assets to net value ratio
Current debt to shareholders' equity ratio
Return on assets (division of net profit by total assets)
Accounts receivable to sales ratio
Liquidity ratio (division of operating cash flows by current debt)
Profit to revenue ratio
Operating profit to sales ratio
Cash and short term investment to current debts ratio
Cash and short term investment to current assets ratio

6.

Results and Discussion

The primary objective of the study is to present a model for predicting stock returns of the companies listed on Tehran Stock Exchange.
The research subject is whether a model helpful in the decisions of investors and other users can be presented using support vector machine
and decision tree. In other words, can a model be presented using the support vector machine technique and decision tree to predict stock
returns in the companies listed on Tehran Stock Exchange?

Forecasting Stock Returns using Support Vector Machine and Decision Tree: a Case Study in Iran Stock Exchange

749

Internat ional Jour nal of Economy, Mana ge ment and Social Sciences , 2(9) Se pt ember 2013

Feature
Selection

Labeling

10-Fold
Crossvalidation

SVM
Classifier

Figure 1. The procedure of the algorithm proposed by support vector machine

In order to conduct research, 1,435 samples from different sources (corporate financial statements) were collected and described each of
which has 30 initial independent variable and one dependent variable (stock returns). Then, in order to classify these samples, they were
divided into 20 classes to each of which a label was attributed indicating that the sample belongs to one of 20 ranges of stock. In order to
label, the samples whose stock range was in the range of [-6, 12] were classified into 20 classes with approximation of 0.9. That is, the
samples whose stock range was in the range of [-6,-5.1] were labeled one and the samples whose stock range was in the range of [-5.9, -4.2]
were labeled two and it continued so on until the range of [11.1, 12] that were labeled 20 and eventually stock returns were classified into
20 categories.
Correlation between variables
In table 1, correlation between 30 independent variables and one dependent variable (stock range) are shown in order of priority:
Table 1.The importance of the independent variables
Priority

The name of independent variable

Priority

The name of independent variable

Firm Size

16

Return on sales (net profit to sales)

Return on working capital (net of net profit to working capital)

17

Net operating cash flow to shareholders equity

Long-term debt to shareholders equity ratio

18

Gross profit to revenue ratio

Net operating cash flow to operating profit ratio

19

Operation cash flow to working capital ratio

Current Ratio

20

Account receivable to sales ratio

Total asset turnover - asset turnover ratio

21

Quick ratio

Sales to asset ratio

22

Net operating cash flow to average total assets

Current debt to shareholders equity ratio

23

Return on assets (net profit to total assets)

Long-term debt to working capital ratio

24

capital returns percentage

10

Fixed assets to net value ratio

25

Liquidity ratio (operating cash flow to current debt)

11

Debt to total assets ratio

26

Working capital to revenue

12

Gross profit to sales

27

Inventories to current assets

13

Return on net value (net profit to net value)

28

Cash and short term investment to current assets

14

Operation cash flow to shareholders' equity

29

Operation cash flow to total debt

15

Profit to revenue ratio

30

Cash and short-term investment of current debts

Statistical analysis of the data


After selection of 10-fold Cross-Validation assessment method and data preparation (labeling data and selection of the first independent
variable M), they will be applied to a support vector machine as pairs of X.
Table 2. The results of applying data to Support Vector Machine Algorithm
Fold

Training Accuracy

Test Accuracy

Training time (seconds)

Test time (seconds)

99.72

91.87

0.96

0.19

99.72

91.62

0.91

0.31

100

92.85

1.02

0.27

100

91.87

0.99

0.19

100

92.36

1.60

0.30

100

91.13

1.23

0.30

100

91.62

1.19

0.29

100

92.61

1.13

0.42

100

92.61

1.16

0.38

10

100

93.10

1.04

0.24

Mean

99.94

92.16

1.12

0.29

Mahboobeh Shafiee et al.

750

Internati onal Journal of Ec onomy, Mana ge me nt and Soci al Sc iences , 2( 9) Sept ember 2013

Table 2 shows the superiority of the proposed approach in solving the problem. Applying the existing data to support vector machine, stock
returns will be predicted with an accuracy of 92.16. Another important point is training time and very little evaluation of the proposed
algorithm.
Solving using decision trees C4.5
In this method, three first phases are similar to solving using support vector machine and only the recognition phase has changed. Applying
1435 data and 15 independent variables shown in table (2) to the algorithm C4.5, the results are shown in table 3.

Table 3. The results of applying data to algorithm C4.5


Fold

Precision Training

Accuracy Test

Tree height

Number of Rules

Training time

Test time

89.95

87.54

21

190

0.38

0.02

85.50

84.23

21

190

0.40

0.02

87.50

86.81

19

188

0.36

0.02

83.24

80.36

22

186

0.36

0.02

86.54

79.12

18

187

0.35

0.02

90.58

75.68

19

188

0.36

0.02

81.39

76.45

20

186

0.36

0.02

85.25

80.36

18

194

0.36

0.02

84.75

79.95

14

193

0.34

0.02

10

81.65

79.54

17

191

0.36

0.02

Mean

85.64

81.00

19

Of 189

0.36

0.02

The presented table shows that applying the existing data to decision tree, stock returns will be predicted with an accuracy of 81%.
10-fold cross-validation was used to evaluate or compare the two above techniques. Then, in the following equation, value of N=10, the
value of t will be obtained using the following equation:

N*m
10 * 0.1116

9.1051

0.001502

Now, if t is calculated with 9 degrees of freedom on the cumulative distribution function of t, its value will be approximately equal to 1.
This means that SVM algorithm is better than C4.5 algorithm with 9 degrees of freedom, with probability of almost one hundred percent.

7.

Conclusions and recommendations

Most studies on stock returns have discussed the relationship between the independent variable and stock returns. Unlike the previous
studies, this study uses a set of variables rather than examining the relationship between individual variables and stock returns. Using
support vector machine technique and decision tree, this study has tried to present a model which can predict stock returns in the companies
listed on Tehran Stock Exchange. The final proposed research model has an accuracy of 92.16 and 81% respectively. Researchers believe
that this model would do a great help in making better decisions by investors in the capital market.
The following recommendations for future research are presented as follows:
12-

Other data mining techniques, such as genetic algorithm and clustering techniques, should be used to investigate stock returns and
the results should be compared with the study results and the model presented by the research.
Since in the inefficient markets, data are provided to users with a time interval, the impact of data on stock returns is delayed.
Thus, the relationship between financial ratios and stock returns should be examined given the factor of time interval in data
transfer.

Forecasting Stock Returns using Support Vector Machine and Decision Tree: a Case Study in Iran Stock Exchange

751

Internat ional Jour nal of Economy, Mana ge ment and Social Sciences , 2(9) Se pt ember 2013

References
[1]

Atsalakis & Valavanis (2009)," Surveying stock market forecasting techniques" An International Journal, Vol. 36, No.7, PP:10696-10707

[2]

Babaeian, Ali; Arab Mazar, Mohammad (2000), "Analysis of the relationship between Balance Sheet items and stock returns changes in the
companies listed on Tehran Stock Exchange", Master's thesis, Shahid Beheshti University

[3]

Ball & Brown, P (1998)," An Empirical Evaluation of accounting income numbers" Journal of accounting research, Autumn ,No.6, Vol.2, PP: 159179

[4]

Chun S. K, Kim S. H,(2004)," Data mining for financial prediction and trading: application to single and multiple markets", Expert Systems with
Applications, No. 26, Vol.19, PP: 131139

[5]

Dostian, Sedigheh, (2006), "Comparison of the relationship between net profit changes and operating cash flows and changes in stock returns of the
companies listed on Tehran Stock Exchange", Masters Thesis, Al-zahra University

[6]

Ghalibaf Asl, Hasan, (2002); "Evaluation of exchange rate effect on firm value in Iran", Masters Thesis in Management, Management School,
Tehran University

[7]

Kanas, A, & Yannopoulous, A. (2001)."Comparing linear and nonlinear forcasts for stock returns". International Review of Economics and finance,
Vol. 10, PP: 383-398

[8]

Karami, Gholam Reza; Moradi, Mohammad Taghi and Fereidoon, (2006), "Evaluation of linear and nonlinear relationships between financial ratios
and stock returns in Tehran Stock Exchange", Accounting and Auditing Reviews, No. 46

[9]

Karyl,Q.C,et al (2005),"A Comparision between Fama and Frenchs Model and Artificial Neural Network In Predicting The Chinese stock market",
Computer and operations research, Vol. 32, PP: 2499-2512

[10]

Lee, Q,(2006)," Cash distribution and returns", University of Michigan, PP:1-32

[11]

Lewellen, Jonathan (2003)," Predicting Return with Financial Ratios" At Lewellen Gmit. Edu, No.4

[12]

Namazi, Mohammad; Rostami, Nooroldin (2006), "Evaluation of the relationship between financial ratios and stock returns of the companies listed
on Tehran Stock Exchange", Accounting and Auditing Reviews, No. 44.

[13]

Olsen, Dennis and Charles Mossman (2003)," Neural Network Forecast of Canadian Stock Returns" International Journal of Forecasting, No. 19,
PP: 453-465

[14]

Omran, m, Rajab, A(2004)," Linear versus non-linear Relationships between Financial Ratios and stock Return."Review of Accounting & Finance,
Vol.3, No.2, PP:84-103

[15]

Raie, Reza; Chavoshi, Kazem (2003); "Forecasting Stock Return in Tehran Stock Exchange, artificial neural networks model and multi-factor
model", Financial Studies Quarterly, No. 15

[16]

Tehrani, Reza; Abbasion, Vahid, (2008); "Application of artificial neural network in scheduling stock transactions through technical analytical
approach", Economic Studies Quarterly, Year VIII, No. 1, pp. 151-177

[17]

Tun , S. Shu, L . and Kuo, C. (2008). "Knowledge discovery in financial investment for forecasting and trading strategy through wavelet-based
SOM networks", Expert Systems with Applications, Vol. 34, No. 2, PP: 935- 951

[18]

Xing, Yuhan (2002)," Firm Investment and Expected Equity Returns", journal of global optimization, PP:253-270

You might also like