You are on page 1of 5

DOI 10.4010/2016.

1059
ISSN 2321 3361 2016 IJESC

Research Article

Volume 6 Issue No. 4

Artificial Immune System Based Organizational Data Prediction


D. Asir Antony Gnana Singh1, E. Jebamalar Leavline2, T. Nithya 3, S. Nivetha4
Department of CSE1, 3, 4, Department of ECE2
Anna University, BIT Campus, Tiruchirappalli, India
Abstract:
This paper proposes an organizational data prediction system using artificial immune system (AIS) based feature selection.
Feature selection is a process of selecting significant features from the dataset. The performance of proposed method is evaluated
using various classifiers such as instance based classifier, radial basis function (RBF), rule based java ripper (JRIP), and tree
based J48. The evaluation metric classification accuracy is used to identify the usefulness of the proposed feature selection
method. Therefore, the experiments are conducted with various datasets to observe the performance of the proposed method in
terms of classification accuracy. The experimental results of this proposed method is better than other methods compared.
Keywords: Artificial immune system, organizational data prediction, data mining, classification, classification accuracy.
I.
INTRODUCTION
In general, most of the people depend on the organizations for
their livelihood. The economic growth of the country mainly
depends on the developments of goods and services. Therefore,
the organizations play a vital role in improving the economy of
our nation. The organizations can be classified into two sectors
namely production sector and servicing sector. The production
sector is also called as manufacturing sectors that concerns
with development of products and they produce the products
such as vehicles, mobiles, laptops, etc. The servicing sectors
concern with rendering the services such as education, banking,
medical, etc.
The prediction plays a significant role in the organizations. For
example, predicting the loan repayment capacity of a loan
applicant in banking sector, predicting the buying pattern of the
customers in a shopping mall or online shopping, predicting or
detecting of faults or defects in the manufacturing or
production units of the organizations etc. The misprediction
leads to the materialistic or physical losses. Therefore,
improving the accuracy in prediction is very essential to
prevent the losses and to improve the quality in production or
services. It is obvious that the economy of our nation can be
increased though improving the quality in production and
services.
Therefore, this paper aims to improve the accuracy on
organizational data prediction system using proposed artificial
immune system based feature selection by removing the
irrelevant and redundant features. Since the irrelevant and
redundant features degrade the accuracy of the predictive
model. This proposed feature selection algorithm for improving
the accuracy in organizational data classification is tested on
various real world datasets. The performance of the proposed
method is evaluated with different classification algorithms
such as instance based classifier namely radial basis function
(RBF), rule based classifier namely JRIP, tree based classifier
namely J48. The rest of the paper is organized as follows:
Section II reviews the literature, Section III presents the
proposed method, Section IV details the implementation and
experimental setup, Section V illustrates and discusses the
results, Section VI concludes this paper.

International Journal of Engineering Science and Computing, April 2016

II. LITERATURE REVIEW


This section discusses various research works that are related to
the proposed method. Seokhwan yang et al discusses the
prediction model based on big data analysis using hybrid FCM
clustering represented that automatic classification without
external interference and showed the advantages of both
supervised and unsupervised learning [1].
Hua yang et al implemented artificial immune system based
intrusion detection for computer security; intrusion detection is
one of the techniques that attempt to recognize the critical
access to computers by the method of analyzing various
connections [2].
Susmita ganguli et al designed an artificial immune system
based image enhancement that solves the optimization
problem. The results were compared with other standard
techniques like histogram equalization and linear contrast
stretching [3]. Zhonghua Li et al presented an adaptive
hierarchical AIS and its radio-frequency identification (RFID)
reader collision-avoidance. An optimal scheduling based
application model for RFID reader-to-reader collision
avoidance problem was used to validate the effectiveness of
adaptive hierarchical artificial immune system (AHAIS). The
experimental results indicated that the proposed AHAIS
optimization can obtain the global optimum more quickly and
more accurately than differential evolution (DE) and particle
swarm optimization (PSO) [4].
Kung Jeng et al presented an improved artificial immune
recognition system with the opposite sign test for feature
selection. Their artificial immune recognition system (AIRS)
outperformed all compared methods with high accuracy.
Hence, this approach can serve as an effective and efficient
feature selection technique [5].
Sri devi et al proposed an AIS based Intrusion detection with
Fisher model feature selection. Their results show that the
classification accuracy and recall of the proposed system was
better than other systems without feature reduction technique
[6].

4633

http://ijesc.org/

Neda Soltani et al presented a novel structure for credit card


fraud detection using AIS. They used the protected method
stimulated algorithm and improved it for detection of frauds
[7].
Junyuan et al presented improved AIS based network intrusion
detection by using rough set theory. This model was tested on
the widely used KDD CUP99 dataset. The results show better
accuracy of the proposed scheme when compared with other
schemes in detection [8].
Liangpei Zhang et al presented an application of artificial
immune systems in remote sensing image classification. This
proposed classification algorithm has high classification
precision. It is a good and efficient classification algorithm and
can be applied to remote sensing image classification [9].
Arij Masmoudi et al developed an artificial immune system for
public transport regulation support system which relies on the
clonal selection algorithm to find optimized solutions to the
transportation regulation problem [10].
B. M. Vidyavathi et al presented a novel hybrid filter feature
selection method for data mining. This proposed method is
superior to some other classical feature selection methods and
can get higher prediction accuracy with less number of
features. The results are highly promising [11].
Nilson et al presented that the most distinguishing
characteristic of filters is that the relevance index is calculated
based solely on a single feature without considering the values
of other features. Such methods no longer assume
orthogonality of features and search the feature space in a
recursive way to efficiently test conditional independence [12].
Kohavi and John proposed that wrappers can be used to search
through all possible subsets of features and explore the mutual
information between features [13].
Efron at el implemented a method that is contrast to wrapper;
they do not separate the learning from the feature selection
part. The selected features are sensitive to the structures of the
underlying classifiers. For this reason, in most cases, the
feature selected by one embedded methods might not be
suitable for others [14].
Baesens and Newman developed a feature selection for
marketing applications represented that selection of feature is
important in this area to avoid cognitive overload of decision
making owners via usage of excessively large feature sets [15].
Jialei Wang et al discussed an online feature selection and its
application. Most online learning studies however require
accessing all the features which is not always appropriate
especially when handling high dimensional data or it is
expensive to acquire all the features. To overcome this
limitation they investigated online feature selection (OFS) [16].
Toefilo and Roberto developed some advances on feature
selection techniques with application to face recognition
presented that a robust and compact presentation is still to be
found, which would allow fast identification of the unique
individuals. If we want to solve this problem we need to use
International Journal of Engineering Science and Computing, April 2016

the dimensionality reduction methods and new feature


selection methods with tolerance based fuzzy distance [17].
Erick et al developed a feature selection approach for scientific
applications. In many cases the data might have the irrelevant
and redundant features that affect the accuracy of the induction
algorithms. It leads to cause a problem in size and
dimensionality in astronomy, remote sensing fields. The
authors used efficient feature selection methods to overcome
these difficulties [18].
Alenezi at el developed an artificial immune system application
in cancer research. They focused on the application of AIS
techniques to cancer research and specifically for prediction of
the recurrence of cancer in patients [19].
Gandhi et al developed a system for prediction of heart disease
using data mining (DM) techniques. DM methods can help as a
solution for these circumstances. In this method different data
mining techniques can be utilized [20].
Iba and Sasaki presented a genetic programming (GP) to
predict financial data that described how effectively GP can be
applied to predict collection data so as to increase income.
Related experiments were conducted with neural networks to
explain the efficiency of the GP based method [21].
Maral haghiat et al conducted a review on data mining
techniques for result prediction in sports. The low prediction
accuracy highlighted the need for further research to obtain
reliable predictions. Prediction accuracy can be improved
through the use of machine learning and DM techniques [22].
Jyoti soni et al designed a predictive data mining system for
medical diagnosis that Bayesian is having same level of
accuracy of decision tree than other methods like K-nearest
neighbor (KNN) and neural networks. The accuracy level for
the decision tree and Bayesian network is further improved
after applying genetic algorithm to reduce actual data size to
get optimal subset of attributes sufficient for heart disease
prediction [23].
Habib Shah et al presented a Boolean function prediction using
hybrid ant colony optimization. They proposed that back
propagation is too slow for many applications and dragging in
a local minimum problem. To rescue the above gap the hybrid
method was used to train neural network and here the hybrid of
common behavior agent ant and bee techniques were used for
training neural network. According to the experiment result the
hybrid algorithm improved the prediction accuracy and
classification of volcano time series data, which was used for
the training of the MLP [24].
Chuanliang et al developed an artificial immune system for
deoxyribonucleic acid (DNA) microarray data analysis. The
development of micro array technology has supplied a large
volume of data for the prediction and diagnosis of cancer. The
authors also presented a micro-array data prediction based on a
highly improved version of the information gain attribute
selection method [25].
Weixiang zhao and Cristina developed modified AIS based
pattern recognition approach for clinical application to enhance
the recognition ability of the conventional immune system
4634

http://ijesc.org/

based classification approach and explains the superiority of


the new method through two case studies of the breast cancer
diagnosis [26]. D Asir et al presented several approaches to
improve the accuracy of classifiers [27-33].
III. PROPOSED METHOD
The schematic diagram of the proposed method is illustrated
in Figure 1. The proposed method consists of several phases.
In the initial phase, the real world organizational data are
collected. Then the collected datasets are given to the
proposed artificial immune system (AIS) based features
selection algorithm, and then the significant features are
selected from the given dataset. Then the classifier is
developed for data prediction. The developed classifier is
tested on the test dataset for determining the predictive
accuracy on the organizational data using the classifier.
A. Proposed algorithm
The flowchart representation of the proposed algorithm is
depicted in Figure 2. The clonal selection algorithm is a
unique type of the artificial immune system approach that is
inspired by the immunological theories that defines the
function and characteristics of the mammalian immune
function. It is a stochastic based algorithm.

Step 4: Select the features using the population P3 such as


the index of the features selected where the the
number of bit is 1 of P3.
Step 5: Find the fitness value Fv=Tp+Tn/P+N (where the
Tp is the true positive instances, Tn is true
negative instances, P is positive instance and N is
the negative instances) is calculated using the
classifier on the selected features.
Step 6: If the fitness value Fv is satisfied, then the
selected features are considered as significat
features. Otherwise repeat Step 2 and continue.
Initially the maximum number of attributes Ma of the
particular givne dataset is calculated. The clone point Cp
is caculated as ceil(Ma/2). Then population P1 is
generated by forming the Ma number of individuals (bits)
and then clonal is performed among the individuals of the
P1 using Cp and exchanging the most significant bits
(MSB) of P1 (MSB(P1)) and the least significant bits
(LSB) of P1 (LSB(P1)) as expressed Figure III and formed
population P2. Then the hypermuation is performed by
mutating randomly a individual (bit) and formed the
population P3 as shown in Figure III. Then the features
are selected using the population P3 such as the index of
the features selected where the the number of bit is 1 of
P3. The fitness value Fv=Tp+Tn/P+N is calculated (where
Tp is the true positive instances, Tn is true negative
instances, P is positive instance and N is the negative
instances) is calculated using the classifier on the selected
features. If the fitness value Fv is satisfied then the
selected features are considered as significat features.
Otherwise Step 2 is repeated and continued.

FIGURE I
SCHEMATIC DIAGRAM OF PROPOSED METHOD
Proposed Algorithm:
Step 1: Initially, the maximum number of attributes Ma
of the particular given dataset is calculated. The
clone point Cp is caculated as ceil(Ma/2).
Step 2: Generate a population P1 by forming the Ma
number of individuals (bits) and then clonal is
performed among the individuals of P1 using Cp
and exchanging the most significant bits (MSB) of
P1 (MSB(P1) and the least significant bits (LSB)
of P1 (LSB(P1) as expressed Figure III and formed
population P2.
Step 3: Perfom the hypermuation by muating randomly a
individual (bit) and formed the population P3 as
shown in Figure III.

International Journal of Engineering Science and Computing, April 2016

FIGURE II
FLOWCHART REPRESENTATION

4635

http://ijesc.org/

based radial basis function (RBF), rule based JRIP, and tree
based J48. Then the performance of the proposed system is
evaluated using the classification accuracy.
V. RESULT AND DISCUSSION
This section illustrates and discusses the experimental
results. Table I shows the classification accuracy of different
classifiers with respective datasets and proposed systems.
From Table I, it is observed that the proposed system reduces
the features intensively and produces better classification
accuracy for the classifiers namely J48, JRIP, and RBF.

FIGURE III
REPRESENTATION CLONING OPERATION
IV.

IMPLEMENTAION AND EXPERIMENTAL SETUP

The proposed system is implemented using Java programming


language with NetbeansIDE8.0. In order to conduct the
experiment, various real world datasets are collected from UCI
machine learning repository [www.ics.uci.edu]. Then the three
supervised machine learning algorithms namely, function

VI. CONCLUSION
This paper presented an artificial immune system based
organizational data prediction system. This proposed system is
implemented using the Java programming language. The
proposed system is tested on various real world datasets. The
performance of the proposed method is evaluated in terms of
classification accuracy of J48, JRIP, and RBF classifiers. This
optimizing method can improve the prediction performance of
the organizational data with fewer features.

TABLE I
CLASSIFICATION ACCURACY OF THE DIFFERENCE CLASSIFIERS WITH RESPECTIVE DATASET AND
PROPOSED METHOD

Dataset

Credit-a
Credit-g
Labor

Without proposed method


FF
J48
JRIP

With Proposed method


J48
JRIP
RBF
SF
Acc
SF
Acc
SF
15
86.08
85.79
79.71
5
86.23
7
87.53
5
20
70.50
71.70
74.00
8
74.90
13
73.1
9
16
73.60
77.19
94.73
7
85.96
6
91.22
8
Acc Classification accuracy, SFSelected Features, FFFull set of features
RBF

VII. REFERENCES
[1] Scokhwan yang et al,A prediction model based on big
data analysis using hybrid FCM clustering ,
ICITST, 2014, P(337-339).
[2] Hua yang et al,Artificial immune system based
intrusion detection creative commons attribution
license, 2014, p(110,121).
[3] Susmita ganguli et al, AIS based image enhancement
technique, springer international publishing, 2015,
p(1-8).
[4] Zhonghua Li , ChunhuiHe,JianmingLi and HongZhouTan, Adaptive hierarchical Based AIS and its
application of RFID reader collision avoidance,
Applied Soft Computing, (2014) P(119138).
[5] Kung-Jeng Wang,Kun-Huang Chen and MelaniAdiran Angelia, An improved immune recognition
system with the opposite sign test for feature
selection, Knowledge based systems,(2014),p(126145).

International Journal of Engineering Science and Computing, April 2016

Acc
86.52
74.30
94.73

[6] R. Sridevi, G. Jagajothi and Rajan Chattemvelli, AIS


based Intrusion Detection with Fisher Score Feature
Selection,IJCA,2001,P(20-31).
[7] Neda Soltani Halvaiee and Mohammad Kazem
Akbari, A novel model for credit card fraud detection
using Artificial Immune Systems, Applied Soft
Computing ,(2014),p( 4049).
[8] Junyuan Shen, Jidong Wang and Hao,An Improved
AIS Based Network Intrusion Detection by Using
Rough Set theory, Communications and Network,(
2012), p( 41-47).
[9] Liangpei
ZHANG,Yanfei
ZHONG,Pingxiang
LI,Applications of AIS in remote sensing image
classification,springer verlag,(2000),p(189-212).
[10] Arij Masmoudi, Sabeur Elkosantini, Sabeur Darmoul,
Habib Chabchoub, AIS for public transport
regulation,9th International Conference on Modeling,(
2012), p(123-130).

4636

http://ijesc.org/

[11] B.M.Vidyavathi and Dr.C.N.Ravikumar, A novel


hybrid filter feature selection method for datamining,
Ubiquitous
Computing
and
Communication
Journal,(2003),p(118-121).

[26] Weixiang zhao and cristina E. davis,A modified AIS


based pattern identification approach of a clinic
application
diagnostics,Artificial
intelligence
method, (2011),p(1-9).

[12] Nilsson R., J.M. Pea, J.Bjrkegren and J.Tegner,


Consistence Feature Selection for Pattern Recognition
in Polynomial Time. Journal of Machine Learning
Research 8(2007) 589-612.

[27] Danasingh Asir Antony Gnana Singh, Subramanian


Appavu Alias Balamurugan, and Epiphany Jebamalar
Leavline. "An unsupervised feature selection
algorithm with feature ranking for maximizing
performance of the classifiers." International Journal
of Automation and Computing 12, no. 5 (2015): 511517.

[13] Kohavi R. and G.H. John. Wrappers for feature subset


selection. Artificial Intelligence97, 273-324,1997.
[14] Efron B., T. Hastie, I. Johnstone, and R. Tibshirani.
Least angle regression. Annals ofStatistics,
32(2):407499, 2004.
[15] Baesens and newman,feature selection in marketing
applications,springer verlag,(2007),p(627-635).
[16] Jialei Wang,peilin Zhao,steven c.h,rong jin,online
feature selection and its applications,IEEE
transactions on knowledge and data engg,(2013),p(15).
[17] Bloch.I,on fuzzy distances and their use in image
processing
under
imprecision,pattern
recognition,(1999),p(1873-1895).
[18] Erick C,shawn newsam and chandrika kamath,FS in
scientific
applications,astrophysical
journal,(2000),p(450-599).
[19] Al Enezi at el,artificial immune system applications
on cancer research,DeSE,2011,P(168-171).
[20] Gandhi and singh, predictions of heart disease using
DM techniques, ABLAZE, (2015), p(520-525).
[21] Iba and sasaki, In genetic programming to predict
financial data, evolutionary computation , (1999),
p(134-137).
[22] Maral highighat,hamid rastegri nasim nourafza, A
review of DM techniques for result prediction in
sports, ACSIJ,(2013),P(1-6).
[23] Jyoti soni, ujma ansari, dipesh Sharma, predictive
datamining for medical diagnosis,IJCA, (2011),
P(1-8).
[24] Habib shah, Rozaida Ghazali, Nazri mohad nawi and
Nawserkhan, Boolean function prediction using
Hybrid Ant Bee Colony Algorithm, computer
science & computational mathematics, volume 2,
(2012), p(11-21).

[28] S. Vidhya 1 , D.Asir Antony Gnana Singh 2 ,


E.Jebamalar Leavline3,Feature Extraction for
Document Classification, International Journal of
Innovative Research in Science, Engineering and
Technology Vol. 4, Special Issue 6, May 2015 50-56
[29] D. Asir Antony Gnana Singh, E. Jebamalar Leavline
Decision Making In Enterprise Computing: A Data
Mining Approach, International Journal Of Core
Engineering & Management (IJCEM) Volume 1,
Issue 11, February 2015 103 103-113
[30] D. Asir Antony Gnana Singh1 , P.Surenther2 , E.
Jebamalar Leavline, Ant Colony Optimization Based
Attribute Reduction for Disease Diagnostic System
International Journal of Applied Engineering
Research, Vol. 10 No.55 (2015)
[31] D. Asir Antony Gnana Singh1 , E. Jebamalar
Leavline, A Pragmatic Approach on Knowledge
Discovery in Databases with WEKA, International
Journal of Engineering Technology and Computer
Research (IJETCR), Volume 2 Issue 7; December;
2014; Page No. 81-87
[32] Danasingh Asir Antony Gnana Singh , Subramanian
Appavu Alias Balamurugan and Epiphany Jebamalar
Leavline, Improving the Accuracy of the Supervised
Learners using Unsupervised based Variable
Selection, Asian Journal of Information Technology
Year: 2014 | Volume: 13 | Issue: 9 | Page No.: 530537
[33] Danasingh Asir Antony Gnana Singh and E.
Jebamalar Leavline. "An empirical study on
dimensionality reduction and improvement of
classification accuracy using feature subset selection
and ranking." In Emerging Trends in Science,
Engineering and Technology (INCOSET), 2012
International Conference on, pp. 102-108. IEEE,
2012.

[25] Chaunliang chen et al, Artificial Immune System for


DNA microarray data analysis, natural computation,
vol(6),(2008),p(633-637).

International Journal of Engineering Science and Computing, April 2016

4637

http://ijesc.org/

You might also like