You are on page 1of 7

2012 - International Conference on Emerging Trends in Science, Engineering and Technology

102

An Empirical Study on Dimensionality Reduction and


Improvement of Classification Accuracy Using
Feature Subset Selection and Ranking
D. Asir Antony Gnana Singh

S. Appavu Alias Balamurugan

E.Jebamalar Leavline

Department of CSE
M.I.E.T Engineering College,
Tiruchirappalli -7,India
asirantony@gmail.com

Research Coordinator
K.L.N College Of Information
Technology, Madurai-15,India
app_s@yahoo.com

Department of ECE
Anna University Chennai, BIT
Campus, Tiruchirappalli - 24, India
jebilee@gmail.com

Abstract - Data mining is a part in the process of


Knowledge discovery from data (KDD). The performance of
data mining algorithms mainly depends on the effectiveness of
preprocessing algorithms. Dimensionality reduction plays an
important role in preprocessing. By research, many methods
have been proposed for dimensionality reduction, beside the
feature subset selection and feature-ranking methods show
significant achievement in dimensionality reduction by
removing irrelevant and redundant features in highdimensional data. This improves the prediction accuracy of the
classifier, reduces the false prediction ratio and reduces the
time and space complexity for building the prediction model.
This paper presents an empirical study analysis on feature
subset evaluators Cfs, Consistency and Filtered, Feature
Rankers Chi-squared and Information-gain. The performance
of these methods is analyzed with the focus on dimensionality
reduction and improvement of classification accuracy using
wide range of test datasets and classification algorithms
namely probability-based Naive Bayes, tree-based C4.5(J48)
and instance-based IB1.
Keywords - Data prepocessing; Dimensionality Reduction;
Feature subset selection; Classification Accuracy; Data
mining; Machine learning;Classifier.

I. INTRODUCTION
Dimensionality reduction for classification process has
involved significant attention in both pattern recognition
and machine learning. High dimensional data space may
increase the computational cost and reduce the prediction
accuracy of classifiers [1]-[3]. The classification process is
known as supervised learning that builds the classifier by
learning from a training data sets[6] and it is observed that,
when the features of training data sets exceed a particular
range of a sample space, the accuracy of the classifier will
decrease [1][2]. There are two ways to be followed to
achieve the dimensionality reduction: feature extraction and
feature selection [5], [3].
In feature extraction problems, [12], [13], [26] the
original features in the measurement space are initially

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

transformed into a new dimension-reduced space via some


specified transformation. Significant features are then
determined in the new space. Although the significant
variables determined in the new space are related to the
original variables, the physical interpretation in terms of the
original variables may be lost. In addition, though the
dimensionality may be greatly reduced using some feature
extraction methods, such as principal component analysis
(PCA) [14], the transformed variables usually involves all
the original variables. Often, the
original variables may be redundant when forming the
transformed variables. In many cases, it is desirable to
reduce not only the dimensionality in the transformed space,
but also the number of variables that need to be considered
or measured [15],[16],[26]. Unlike feature extraction,
feature selection aims to seek optimal or suboptimal subsets
of the original features [16]-[19], [31], [32] by preserving
the significant information carried by the collected complete
data, to facilitate feature analysis for high dimensional
problems [22]-[24], [26].
This study mainly focuses on analyzing the performance
of these Cfs, Consistency and Filtered attribute subset
evaluators in view of dimensionality reduction with the
wide range of test datasets and the learning algorithms
namely probability-based learner Naive Bayes, tree-based
learner C4.5(J48) and lazy instance-based learner IB1.
A. Feature subset Selection
Feature subset selection is a process that removes
irrelevant and redundant features form the dataset to
improve the prediction accuracy of the learning algorithm.
The irrelevant features reduce the predictive accuracy and
redundant feature deteriorate the performance of the learners
and requires high computation time and other resources for
training and testing the data [9] [8]. The Feature subset
selection can be classified into three methods namely
wrapper, filter and hybrid [10], [11]. In wrapper approach, a
predetermined learning model is assumed, wherein features

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


103
are selected that justify the learning performance of the
particular learning model [27], [11] whereas in the filter
approach, statistical analysis of the feature set is required,
without utilizing any learning model [28]. The Hybrid
approach is the combination of filter and wrapper methods.
1) Subset Generation and Searching methods:
A Search strategy and criterion functions are needed for
subset selection. The search algorithm generates and
compares possible feature-selection solutions by calculating
their criterion function values as a measure of the
effectiveness of each considered feature subset. The feature
subset with the best criterion function value is given as the
output of the feature-selection algorithm [12], [5], [11] [4].
In general, many searching strategies are followed to
generate the feature subset [30]-[33] as follows. The first
method is sequential forward search, starts the searching
process with empty set and adds the features effectively.
The second method, sequential backward search starts with
the full set of attributes at each step; it removes the worst
attribute remaining in the set. The third method bidirectional
selection starts on both ends to add and remove features
concurrent. In the fourth method, the searching process is
done on the randomly selected subset using a sequential or
bidirectional strategy. The fifth method is complete search,
thoroughly search the subsets so that it gives best solution,
but is not possible when the number of features are
large.[11] [30]-[33].
B. Feature ranking
Feature ranking makes use of a scoring function S(i)
computed from the values xk,i and yk (k = 1, . . . , m
examples and i = 1, . . . , n features). By convention, It is
assumed that a high score is indicative of high relevance and
that features are sorted in decreasing order of S(i). We
consider ranking criteria defined for individual features,
independently of the context of others. Features are ranked
according to some evaluation measure such as chi-squared,
Information gain with the list of attributes and ranks from
the beginning to the last ranked features [25]. The threshold
value decides the selected features from the ranked features.
C. Classification algorithms
The classification algorithms can be employed for
evaluating the performance of feature subset evaluators.
Several algorithms are available, each has pitfalls and
strengths. Research says that, there is no possibility to work
best on all supervised learning problems by a single learning
algorithm. The most familiar algorithms namely Naive
Bayes, C4.5 (J48) and IB1 have been chosen to build an
experimental setup to carry out this study. This setup is
strengthened with the most highlighted characteristic of
Naive Bayes classifier that, it estimates the parameters

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

(mean and variance of the variables) necessary for


classification by minimal amount of training data;
advantages of C4.5 (J48) decision tree which is simple to
understand and interpret, requires little data preparation,
robust and performs well with large data in a short time;
benefit of IB1 that it is able to learn quickly from a very
small dataset.
1) Naive Bayes: In this classifier, the classification is
achieved by the principle of basic Bayes theorem. It gives
relatively better performance on classification tasks [47]. In
general, Naive Bayes (NB) learns with the assumption that,
features are independent to the given class variable. More
formally, this classifier is defined by discriminant function:
N
(1)
P ( x j|ci ) P ( ci )
f i( X )
j 1
where X = (x1, x2, ..., xN) denotes a feature vector and cj, j =
1, 2, ..., N, denotes possible class labels. The training phase
for learning a classifier consists of estimating conditional
probabilities P(xj|ci) and prior probabilities P(ci). Here,
P(ci) are estimated by counting the training dataset that fall
into class ci and then dividing the resulting count by the size
of the training set. Similarly, conditional probabilities are
estimated by simply observing the frequency distribution of
feature xj within the training subset that is labeled as class
ci. To classify a class-unknown test vector, the posterior
probability of each class is calculated, given the feature
values present in the test vector; and the test vector is
assigned to the class that is of the highest probability[49].
2) C4.5 (J48):
Many techniques are followed to build a
decision tree. In all these techniques the given data are
formed into a tree structure, the branches represent the
association between feature values and class label. The C4.5
(J48) is familiar and superior among these techniques [48].
It partitions the training datasets in recursive fashion,
derived from examining the potential on feature values in
separating the classes. The decision tree learns from a set of
training dataset through an iterative process of choosing a
feature and splitting the given dataset based on the values of
that feature. The entropy or information gain is used to
select the most representative features for classification. The
selected features have the lowest entropy and highest
information gain. This learning algorithm works with the
following steps. First, computing the entropy measure for
each feature, secondly partitioning the set of dataset
according to the possible values of the feature that has the
lowest entropy, and thirdly estimating probabilities, in a
way exactly the same as the Naive Bayes approach [49].
Although feature sets are chosen one at a time in a greedy
manner, they are dependent on results of previous tests.

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


104
3) IB1: This classifier works on nearest neighbour
classification principle. In this approach, the distance
between the training instance and the given test instance are
calculated by the Euclidean distance measure. If more than
one instance has the smallest distance to the test instance,
the first found instance is used. Nearest neighbour is one of
the most significant learning algorithms; it can be adapted to
solving wider problems [46]. To classify an unclassified
vector X, this algorithm ranks the neighbours of X amongst
a given set of N data (Xi, ci), i = 1, 2, ..., N, and uses the
class labels cj (j = 1, 2,..., K) of the K most similar
neighbours to predict the class of the new vector X. In
particular, the classes of these neighbours are weighted
using the similarity between X and each of its neighbours,
where similarity is measured by the Euclidean distance
metric. Then, X is assigned as the class label with the
greatest number of votes among the K nearest class labels.
This learner works according to the persecution that the
classification of an instance is likely to be most similar to
the classification of other instances that are nearby within
the vector space. It does not depend on prior probability
compared to the other learning algorithm such as NB. This
is computationally cost effective when the size of the
dataset is less and the calculation of the distance is quite
expensive when the dataset is large. Conversely, the
computational cost is reduced by adopting the PCA and
information gain based feature ranking for dimensionality
reduction.The rest of this article is organized as follows: In
section II, the related works are reviewed. Section III,
elucidates the proposed work. Section IV, represents the
experimental result with discussion. In section V,
conclusion is drawn with future directions.
II. RELATED WORK
Many feature subset evaluator and ranker algorithms
have been proposed for choosing the most relevant feature
subsets form the datasets. In this section, feature subset
selection algorithms namely Cfs, Consistency and Filtered
Subset Evaluator and Features rankers namely Chi-squared
and Information gain are discussed.
a.

CFS
In this method CFS (Correlation-based Feature
Selection), the subsets of features are evaluated rather than
individual features [38, 39, 43]. The kernel of this heuristic
principle that evaluates the effectiveness of individual
features based of the degree of inter-correlation among the
features to predict the class. The goodness of the feature
subset is determined by heuristic equation (1) on basis of the
features presented in subset have high correlation with the
class and low inter-correlation with each other.

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

k r cf

Merits

k (k

(2)
1)r ff

where S is the subsets containing k features, rcf the average


feature-class correlation, and rff the average feature-feature
inter-correlation. The numerator can be thought of as giving
an indication of how predictive a group of features are; the
denominator of how much redundancy is there among them.
This heuristic method handles irrelevant features, as they
will be poor predictors of the class. Redundant features are
discriminated against as they will be highly correlated with
one or more of the other features. Since features are treated
independently, CFS cannot identify strongly interacting
features such as in a parity problem. However, it has been
shown that it can identify useful features under moderate
levels of interaction [38] [43].
In order to apply equation (2) it is necessary to compute
the correlation (dependence) between features. Cfs first
discretizes numeric features using the technique discussed
in [42] [43] and then uses symmetrical uncertainty to
estimate the degree of association between discrete features
(X and Y):

SU

2.0

H ( X ) H (Y ) H ( X , Y )
H ( X ) H (Y )

(3)

After computing a correlation matrix, Cfs applies a heuristic


search strategy to find a good subset of features according to
equation (3). As mentioned at the beginning of this section,
we use the modified forward selection search, which
produces a list of features ranked according to their
contribution to the goodness of the set.
b.

Consistency
In consistency-based subset elevation, many approaches
use class consistency as an evaluation metric in order to
select the feature subset [40], [41]. These methods look for
combinations of features whose values divide the data into
subsets containing a strong single class majority.
several approaches to feature subset selection use class
consistency as an evaluation metric [40], [41]. These
methods look for combinations of features whose values
divide the data into subsets containing a strong single class
majority. Usually the search is biased in favour of small
feature subsets with high-class consistency. Our
consistency-based subset evaluator uses the work of [41]
consistency metric.
J

Consistency S

i 0

(4)

where s is an feature subset, J is the number of distinct


combinations of feature values for s, Di is the number of
occurrences of the ith feature value combination, Mi is the
cardinality of the majority class for the ith feature value
combination and N is the total number of instances in the

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


105
data set. Data sets with numeric features are first discretized
using the methods found in [42, 43]. The modified forward
selection search described earlier in this section is used to
produce a list of features, ranked according to their overall
contribution to the consistency of the feature set.
c.

Filter
This approach can be most suitable for reducing only the
data in dimensionality, rather than for training a classifier.
The goodness of the subset selection takes the advantages of
the Cfs subset evaluator and Greedy Stepwise searching
algorithm. It reduces the dimensionality as maximum as
possible in high dimensional data [44].

d.

Chi-Squared
This Feature ranker uses the chi-square (2) test [50]. It
estimates the worth fullness of a feature by computing the
value of the chi-squared statistic with respect to the class.
The initial hypothesis H0 is the assumption that the two
features are unrelated, and it is tested by chi-squared
formula as in equation-(5)
2

Oij E ij

(5)

E ij

i 1j 1

where Oij is the observed frequency and Eij is the expected


(theoretical) frequency, asserted by the null hypothesis. The
greater the value of 2, the greater the evidence against the
hypothesis H0 is.
e.

Information Gain
This is a ranker based feature selection measure using
information theory. Given the entropy is a criterion of
impurity in a training set S, a measure reflecting additional
information about Y is defined, provided by X that
represents the amount by which the entropy of Y decreases
[51]. The Information Gain(Info-Gain) measure is formulated
in equation (6).
Info-Gain = H(Y) H(Y |X ) = H(X ) H(X|Y)
(6)
The information gained about Y after observing X is
equal to the information gained about X after observing Y. A
weakness of the IG criterion is that it is biased in favor of
features with more values even when they are not more
informative.
III.

PROPOSED WORK

For this proposed work, the experimental setup was


constructed with three feature subset selection techniques ,
two Feature Ranking method and ten standard machinelearning data sets from the Weka data set collection and
UCI Machine Learning Repository[44][52] to carry out the
experiment. These data sets range in size from a minimum
of 5 to 36 features in maximum with instances range in size
form a minimum of 14 to maximum of 1728. The

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

experiment is carried out by the Weka data mining tool [44].


These three feature subset evaluators and two feature
rankers were evaluated with the help of the datasets by the
well known classifier algorithms namely probability-based
Naive Bayes(NB), tree-based C4.5 (J48) and instance-based
IB1.
For balancing the improvement in dimensionality
reduction and classification accuracy, the threshold value Tv
is formulated and fixed by trial and error basis as shown in
equation-(7). It selects features whos rank value is greater
than the threshold value from the raked feature set by Feature
rankers Chi-squared and Information Gain.
M
Min M
Max
Tv
Min
M
(7)
2
Tv- Threshold value
Min Minimum rank value
Max Maximum rank value
IV.

RESULTS AND DISCUSSION

The Summary of datasets used in the experiments and the


experimental results derived from the analyses of
dimensionality reduction and improvement of the accuracy
of the classifiers are shown in Table-I, Table -II and TableIII, respectively.
TABLE I.
S.No.

Dataset

Instances

Features

Classes

1
2
3
4
5
6
7
8
9
10

Contact Lenses
Diabetes
Glass
Ionosphere
Iris
Labor
Soybean
Vote
Weather
Car

24
768
214
351
150
57
683
435
14
1728

5
9
10
35
5
17
36
17
5
7

3
2
7
2
3
2
19
2
2
4

TABLE II.

S.
No
1
2
3
4
5
6
7
8
9
10

SUMMARY OF DATA SETS

COMPARISON OF REDADUCED FEATURES SUBSETS BY


FEATURE SUBSET EVALUATORS AND RANKERS

Feature Subset
Evaluators

Dataset

Contact
Lenses
Diabetes
Glass
Ionosphere
Iris
Labor
Soybean
Vote
Weather
Car

Feature Rankers

Cfs

Consistency

Filtered

Chisquared
Ranker

Information
Gain
Ranker

4
8
14
2
7
22
4
2
1

4
7
7
2
4
13
10
2
6

3
5
14
2
4
7
1
1
1

3
7
25
2
3
10
4
1
2

3
8
25
2
3
10
3
1
1

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


106
other methods in terms of dimensionality reduction
regardless of the Classifier. The Chi-squared ranker achieves
better performance rather than Information gain, consistency
and CFS.

In dimensionality reduction analysis, it is observed that


the Feature subset evaluators namely Filter, Consistency and
Cfs, Feature Rankers namely Chi-squared and Information
Gain have reduced much dimensionality as shown in
Figure1. The performance of the Filter is superior than the all
TABLE III.

S.No.

SUMMARY OF CLASSIFIERS ACCURACY WITH RESPECT TO THE FEATURE SUBSET EVALUATORS AND RANKERS
Accuracy of NB of Reduced
Feature Subsets

Dataset

Accuracy of C4.5(J48) of Reduced


Feature Subsets

Accuracy of IB1 of Reduced Feature


Subsets

II

III

IV

II

III

IV

II

III

IV

70.83

70.83

70.83

87.50

87.50

70.83

83.33

70.83

87.50

87.50

66.66

62.50

66.66

75.00

75.00

77.47
47.66

77.47
44.39

76.43
44.39

76.43
49.06

76.43
47.66

74.86
68.69

74.86
70.09

74.60
65.88

74.60
69.62

73.04
68.69

68.35
71.02

68.35
70.09

70.18
71.02

70.18
77.57

68.48
71.02

2
3

Contact
Lenses
Diabetes
Glass

Ionosphere

92.02

87.17

92.02

83.47

83.19

90.59

87.46

90.59

91.45

91.16

88.88

87.74

88.88

88.03

88.03

5
6
7
8
9
10

Iris
Labor
Soybean
Vote
Weather
Car
Average

96.00
91.22
87.11
96.09
57.14
70.02
78.55

96.00
87.71
81.69
92.41
57.14
85.53
78.03

96.00
87.71
83.30
95.63
50.00
70.02
76.63

96.00
84.21
81.25
92.87
50.00
76.85
77.76

96.00
84.21
82.72
94.71
50.00
76.85
77.92

96.00
77.19
85.65
96.09
42.85
70.02
77.27

96.00
82.45
83.74
96.32
42.85
92.36
80.94

96.00
80.70
82.86
95.63
50.00
70.02
77.71

96.00
80.70
80.81
95.63
50.00
76.56
80.28

96.00
80.70
82.86
95.63
50.00
76.56
80.21

96.66
84.21
83.89
94.02
78.57
66.84
79.91

96.66
87.71
76.57
93.33
78.57
77.25
79.87

96.66
87.71
79.94
91.72
64.28
66.84
78.38

96.66
80.70
79.20
93.79
64.28
73.49
79.89

96.66
80.70
78.91
93.10
64.28
73.49
78.96

I- Cfs, II-Consistency ,III- Filter ,IV- Chi-square Ranker, V- Information Gain Ranker

Fig 1:Performance comparison in dimensionality


reduction of Feature subset Evaluators and Rankers

In classification accuracy analysis, it is observed that,


the performance of Cfs is superior for instance-based IB1
and probability-based Naive Bayes (NB) and inferior for the
tree-basedC4.5 (J48). The Consistency performs well with
C4.5(J48) compared to other methods. The performance of
Chi-squared is considerably good in NB and C4.5(J48)
rather than Information Gain and Filtered methods as shown
in Figure 2.

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

Fig 2:Comparison of accuracy of classifiers with Feature Subset


Evaluators and Rankers

V.

CONCLUSION

This paper presented an experimental analysis on


dimensionality reduction and improvement of classifier
accuracy by the Feature Subset Evaluator namely Cfs,
Consistency and Filter, Feature Ranker namely Chi-squared
and Information Gain. From this experimental study, it is
observed that In dimensionality reduction analysis, The

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


107
performance of the Filter is superior than the all other
methods in terms of dimensionality reduction regardless of
the Classifier. The Chi-squared ranker achieves better
performance rather than Information gain, consistency and
Cfs. In classification accuracy analysis, it is observed that,
the performance of Cfs is superior for instance-based IB1
and probability-based Naive Bayes (NB) and inferior for the
tree-basedC4.5 (J48). The Consistency performs well with
C4.5(J48) compared to other methods. The performance of
Chi-squared is considerably good in NB and C4.5(J48)
rather than Information Gain and Filtered methods.
REFERENCES
[1] C.-I Chang and S. Wang, Constrained band selection for
hyperspectral imagery, IEEE Trans. Geosci. Remote Sens.,
vol. 44, no. 6, pp. 15751585, Jun. 2006.
[2] A. Plaza, P. Martinez, J. Plaza, and R. Perez,
Dimensionality
reduction
and
classification
of
hyperspectral image data using sequences of extended
morphological transformations, IEEE Trans. Geosci.
Remote Sens., vol. 43, no. 3, pp. 466479, Mar. 2005.
[3] Liangpei Zhang, Yanfei Zhong, Bo Huang, Jianya Gong, and
Pingxiang Li,Dimensionality Reduction Based on Clonal
Selection for Hyperspectral Imagery IEEE Transactions On
Geoscience And Remote Sensing, Vol. 45, No. 12,
December 2007
[4] J. Wang and C.-I Chang, Independent component analysisbased dimensionalityreduction with applications in
hyperspectral image analysis,IEEE Trans. Geosci. Remote
Sens., vol. 44, no. 6, pp. 15861600,Jun. 2006.
[5] A. Jain and D. Zongker, Feature selection: Evaluation,
application, andsmall sample performance, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 19, no. 2, pp. 153158, Feb.
1997.
[6] Jiawei Han and Micheline Kamber Data Mining Concepts
and Techniques Second Edition Elsevier 2006
[7] Xuechuan Wang Kuldip K. Paliwal Feature extraction and
dimensionality reduction algorithms and their applications
in vowel recognition Pattern Recognition 36 (2003) 2429
2439
[8] Qinbao Song, Jingjie Ni and Guangtao Wang A Fast
Clustering-Based Feature Subset Selection Algorithm for
High Dimensional Data IEEE Transactions On Knowledge
And Data Engineering, Vol. X, No. X, 2011
[9] John G.H., Kohavi R. and Pfleger K.,Irrelevant Features
and the Subset Selection Problem, In the Proceedings of the
Eleventh International Conference on Machine Learning pp
121-129, 1994.
[10] Oh,l, Lee,.J & Moon B (2004) Hybrid genetic algorithms
for feature selection IEE transaction on Pattern Analysis
and Machine Intelligence 26(11). 1424-1437
[11] Md. Monrul Kabir,Md. Shahjahan,Kazuyuki Murase A new
hybrid ant colony optimization algorithm for feature
selection Expert systems with application 39 3747-3763
(2012)
[12] A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern
Recognition: A Review, IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[13] A.R. Webb, Statistical Pattern Recognition, seconded.
Wiley, 2002.

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

[14] I.T. Jolliffe,Principal Component Analysis, second ed.


Springer, 2002.
[15] G.P. McCabe, Principal Variables, Technometrics, vol.
26, pp. 137-144, May 1984.
[16] W.J. Krzanowski, Selection of Variables to Preserve
Multivariate Data Structure Using Principal Components,
Applied Statististics, vol. 36, no. 1, pp. 22-33, 1987.
[17] P. Mitra, C.A. Murthy, and S.K. Pal, Unsupervised Feature
Selection Using Feature Similarity, IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301312, Mar. 2002.
[18] B. Krishnapuram, A.J. Hartemink, L. Carin, and M.A.T.
Figueiredo, A Bayesian Approach to Joint Feature
Selection and Classifier Design, IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 26, no. 9, pp. 11051111, Sept. 2004.
[19] M.H.C. Law, M.A.T. Figueiredo, and A.K. Jain,
Simultaneous Feature Selection and Clustering Using
Mixture Models, IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 26, no. 9, pp. 1154-1166, Sept.
2004.
[20] R. Kohavi and G.H. John, Wrappers for Feature Subset
Selection, Artificial Intelligence, vol. 97, nos. 1-2, pp. 273324, Dec. 1997.
[21] A.J. Miller,Subset Selection in Regression Chapman and
Hall, 1990.
[22] P. Pudil, J. Novovicova, and J. Kittler, Floating Search
Methods in Feature Selection, Pattern Recognition Letters,
vol. 15, no. 11, pp. 1119-1125, Nov. 1994.
[23] S.K. Pal, R.K. De, and J. Basak, Unsupervised Feature
Evaluation: A Neuro-Fuzzy Approach, IEEE Trans. Neural
Networks, vol. 11, no. 2, pp. 366-376, Mar. 2000.
[24] K.Z. Mao, Identifying Critical Variables of Principal
Components for Unsupervised Feature Selection, IEEE
Trans. Systems, Man, and Cybernetics, Part B, vol. 35, pp.
339-344, 2005.
[25] I.T. Jolliffe, Discarding Variables in a Principal
Component Analysis-I: Artificial Data, Applied Statistics,
vol. 21, no. 2, pp. 160-173, 1972.
[26] Hua-Liang Wei and Stephen A. Billings Feature Subset
Selection and Ranking for Data Dimensionality Reduction
IEEE Transactions On Pattern Analysis And Machine
Intelligence, Vol. 29, No. 1, January 2007
[27] Guyon, I & Elisseeff A an introduction to variable and
feature selection, Journal of Machine Learning Research 3,
1157-1182 (2003)
[28] Dash, M., & Liu, H. Feature selection for classification.
Intelligent Data Analysis (1) 131-156 (1997)
[29] Huang, J., Cai,Y., &Su,X. A hybrid genetic algorithm for
feature selction wrapper based on mutual information.
Pattern Recognition letters, 28, 1825-1844 (2007)
[30] Guan,l S., Liu, J., & Qi, Y . An incremental approach to
contribution-based feature selection Journal of Intelligence
System 13(1). (2004)
[31] Peng, H., Long, F., & ding. C. Overfiting in making
comparisions between variable selection methods. Journal
of Machine Learning Research, 3, 137-1382.(2003)
[32] Gasca, E., Sanchez, J.S., & Alonso R. Elimination
redundancy and irrelelevance using a new MLP-based
feature selection method. Pattern Recognition, 39, 313-315
(2006)
[33] Hsu, C., Huang. H., & Schuschel, D. The ANNIGMAwrapper approach to fast feature seletion for neural nets.

2012 - International Conference on Emerging Trends in Science, Engineering and Technology


108

[34]
[35]
[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

IEEE Transactions on system, Man, and Cybernetics- Part B:


Cybernetic, 32(2) 207-212(2002)
Caruana, R., & Freitage, D Greedy attribute selection. In
proceedings of 11th international conference of machine
learing USA: Morgan Kaufman (1994)
Lai, C., Reinders, M.J.T., & Wessels, L Random subspace
method for multivariate feature selection. Pattern
Recognition Letters , 27, 1067-1076 (2006)
Straceezzi, D. J., & Utgoff, P. E. Randomixed variable
elimination.
Journal of Machine Learning Research 5, 1331-1335(2004)
Liu, H., & Tu, L. Toward integrating feature selection
algorithems for classification and clustering . IEEE
Transactions on Knowledge and Datat Engineering 17(4),
491-502.(2004)
M. A. Hall, Correlation-based feature selection for
machine learning, Ph.D. thesis, Department of Computer
Science, University of Waikato, Hamilton, New Zealand,
1998.
Mark Hall, Correlation-based feature selection for discrete
and numeric class machine learning," in Proc. of the 17th
International Conference on Machine Learning (ICML2000,
2000.
H. Almuallim and T. G. Dietterich, Learning with many
irrelevant features," in Proceedings of the Ninth National
Conference on Arti_cial Intelligence. 1991, pp. 547{552,
AAAI Press.
H. Liu and R. Setiono, A probabilistic approach to feature
selection: A _lter solution," in Proceedings of the 13th
International Conference on Machine Learning. 1996, pp.
319{327, Morgan Kaufmann.
U. M. Fayyad and K. B. Irani, Multi-interval discretisation
of continuous-valued attributes," in Proceedings of the
Thirteenth International Joint Conference on Arti_cial
Intelligence. 1993, pp. 1022{1027, Morgan Kaufmann.
Mark A. Hall, Geo_rey Holmes Benchmarking Attribute
Selection Techniques for Discrete Class Data Mining IEEE

ISBN : 978-1-4673-5144-7/12/$31.00 2012 IEEE

[44]

[45]

[46]
[47]
[48]

[49]

[50]
[51]

[52]

Transactions On Knowledge And Data Engineering, Vol. 15,


No. 3, May/June 2003
Remco R. Bouckaert,Eibe Frank,Mark Hall,Richard
Kirkby,,Peter Reutemann,Alex Seewald,David Scuse
WEKA Manual for Version 3-6-6 University of Waikato,
Hamilton, New Zealand October 28, 2011- P 220-220
P. Langley, W. Iba, and K. Thompson, An analysis of
Bayesian classi_ers," in Proc. of the Tenth Na- tional
Conference on Arti_cial Intelligence, San Jose, CA, 1992,
pp.
223{228,
AAAI
Press,
[Langley92.ps.gz,from,http://www.isle.org/_langley/papers/b
ayes.aaai92.ps].
Kuramochi, M., and Karypis, G., Gene classification using
expression profiles: a feasibility study, International Journal
on Artificial Intelligence Tools, 14 (4) (2005) 641-660.
Domingos, P., and Pazzani, M., Feature selection and
transduction for prediction of molecular bioactivity for drug
design, Machine Learning, 29 (1997) 103-130.
Xing, E. P., Jordan, M. L., and Karp, R. M., Feature
selection for high-dimensional genomic microarray data,
Proceedings of the 18th International Conference on
Machine Learning, 2001, 601-608.
J. Novakovic, 120 P. Toward optimal feature selection
using ranking methods and classification algorithms
Strbac, D. Bulatovic Yugoslav Journal of Operations
Research, Number 1, 119-135 -21 (2011)
Hall, M.A., and Smith, L.A., Practical feature subset
selection for machine learning,Proceedings of the 21st
Australian Computer Science Conference, 1998, 181191.
Liu, H., and Setiono, R., Chi2: Feature selection and
discretization of numeric attributes,Proc. IEEE 7th
International Conference on Tools with Artificial
Intelligence, 1995, 338-391.
D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz UCI
Repository
of
Machine
Learning
Databases,
http://www.ics.uci.edu/~mlearn/ MLRepository.html, 2006.

You might also like