You are on page 1of 7

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO.

5, SEPTEMBER 2002 1211

RBF Neural Network Center Selection Based on Fisher Ratio Class


Separability Measure
K. Z. Mao

Abstract—For classification applications, the role of hidden form templates of the input. An alternative to input clustering is
layer neurons of a radial basis function (RBF) neural network can input-output clustering [6], [7]. The input-output clustering dif-
be interpreted as a function which maps input patterns from a
fers from the input clustering in that it determines center loca-
nonlinear separable space to a linear separable space. In the new
space, the responses of the hidden layer neurons form new feature tions based on not only the input, but also the output deviations.
vectors. The discriminative power is then determined by RBF Besides the clustering methods, the orthogonal forward selec-
centers. In the present study, we propose to choose RBF centers tion algorithm [8]–[10] is another frequently used method for
based on Fisher ratio class separability measure with the objective RBF centers selection. The basic idea of this method is to intro-
of achieving maximum discriminative power. We implement this
idea using a multistep procedure that combines Fisher ratio, an duce an orthogonal transform to facilitate the center selection
orthogonal transform, and a forward selection search method. procedure. RBF centers can also be determined using the re-
Our motivation of employing the orthogonal transform is to cently developed support vector machine (SVM) method [11],
decouple the correlations among the responses of the hidden layer [12]. The basic idea of SVM is to determine the structure of the
neurons so that the class separability provided by individual RBF
neurons can be evaluated independently. The strengths of our
classifier by minimizing the bounds of training error and gen-
method are double fold. First, our method selects a parsimonious eralization error. Usually, the centers selected using SVM are
network architecture. Second, this method selects centers that close to the boundary of the decision surface. In contrast, the
provide large class separation. centers selected by clustering are templates or stereotypical pat-
Index Terms—Center selection, Fisher’s class separability mea- terns of the training samples.
sure, pattern classification, radial basis function (RBF) neural net- The hidden layer of the RBF neural network classifier can be
works. viewed as a function that maps the input patterns from a non-
linear separable space to a linear separable space. In the new
I. INTRODUCTION space, the responses of the hidden layer neurons form new fea-
ture vectors for pattern representation. Then the discriminative

D UE to its structural simplicity, the radial basis function


(RBF) neural network has been widely used for approxi-
mation and learning [1]. The standard RBF neural network con-
power is determined by RBF centers. In the present study, we
propose to select RBF centers based on their ability to provide
largest class separability. Two measures related to class sepa-
sists of three layers: an input layer, a hidden layer, and an output rability are interclass difference and intraclass spread. Fisher
layer. Training of RBF neural networks involves selecting the ratio, which is defined as the ratio of the interclass difference to
hidden layer neuron centers and estimating the weights that con- the intraclass spread, is a good combination of these two mea-
nect the hidden and the output layers. Nonlinear optimization sures [13]. The advantages of employing Fisher ratio are that it
algorithms, such as gradient descent, are the most straightfor- requires no explicit coding of class labels and it can deal with
ward approaches to RBF neural network training [1]. However, heavily unbalanced classes. These advantages are retained when
most nonlinear optimization algorithms suffer from problems Fisher ratio is combined with a multilayer perceptron (MLP)
of having a long training time and possibility of being trapped neural network [14]. In our study, we incorporate Fisher ratio
in local minima. Very often the RBF neural network is trained into an orthogonal transform and a forward selection procedure
using two-step procedures. In the first step, the RBF centers are to evaluate and select RBF centers. The motivation of intro-
determined empirically. Then the weights are estimated using ducing the orthogonal transform is to decouple the correlations
linear least squares algorithms. An intuitive solution to center among the responses of the hidden layer neurons so that the class
determination is to randomly select data points from the training separability provided by individual neurons can be evaluated in-
set [2] or to randomly generate data points from the input space dependently. The strengths of our method are double fold. First,
[3]. Here, it is noted that randomly selected centers cannot guar- this method is able to find a parsimonious network architecture.
antee to well cover the training data. The classical approach Second, this method selects centers that provides large class sep-
to locate the neuron centers is to apply clustering techniques, arability.
This paper is organized as follows. Methods for RBF centers
such as -means clustering [4] and vector quantization [5], to
selection in the context of nonlinear approximation are briefly
reviewed in Section II. In Section III, a new method that selects
Manuscript received January 4, 2001; revised July 11, 2001. RBF centers based on the Fisher ratio class separability measure
The author is with the School of Electrical and Electronic Engineering, is developed. Numerical examples are presented in Section IV
Nanyang Technological University, Singapore 639798, Singapore (e-mail:
ekzmao@ntu.edu.sg). to demonstrate the effectiveness of our algorithm. Finally, some
Publisher Item Identifier S 1045-9227(02)04440-5. concluding remarks are given in Section V.
1045-9227/02$17.00 © 2002 IEEE
1212 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

II. RBF CLASSIFIER CENTER SELECTION IN THE FRAMEWORK Substituting (4) into (1), yields
OF NONLINEAR APPROXIMATION

The standard RBF neural network with a single output neuron


(5)
realizes a mapping function, , where the -dimen-
sional input vector is submitted to the network and the scalar
output is obtained to make the classification decision. The Selection of important RBF centers from training set
output of the RBF neural network can be described by is equivalent to selection of important
model term for the model (5) from a given model term set
(1) [8], [9]. Linear model term can be selected
using a forward regression procedure, which is a bottom-up
where is the number of the hidden layer neurons, and method starting from an empty set (see, for example, [17]). This
are the RBF centers and width, respectively. and are the method builds the model using a multistep procedure that se-
weights. For a typical two-class classification problem, if the lects one model term at a time. The model term considered for
desired output is coded as “1” for samples from class and inclusion at each step is the one that provides most substantial
for samples from class 2, the classifier determines the class label contribution to approximation error reduction. The orthogonal
of the input vector as forward regression algorithm [8], [9] which combines the or-
thogonal transform with the forward regression procedure is a
(2) more efficient method than the conventional forward regression
algorithm for selecting model terms from a large candidate term
where the sign function is given by
set. The advantage of employing orthogonal transform is that
if the responses of the hidden layer neurons are decorrelated so
if that the contribution of individual candidate neurons to the ap-
Equation (1) contains several adjustable parameters in- proximation error reduction can be evaluated independently and
the selection procedure is facilitated. However, as pointed out in
cluding centers , width and weights
and . A suitable value of could [18], the selected RBF centers are suboptimal. This is because
improve classification results. Estimation of could be the orthogonal forward regression procedure starts with fixed
done using simple statistical estimation (see, for example, centers and searches for a combination of centers that best ap-
proximates the data [18]. In contrast, the optimal approach like
[15]) or genetic algorithm optimization [16]. In the present
study, we deal only with the determination of centers and principal component analysis allows the centers to be dependent
weights. Suppose we have training data pairs, on the data and finds the optimal set of centers corresponding
, where denotes the class to some prescribed criteria and constraints [18]. Another reason
that leads to the suboptimality of the orthogonal forward regres-
label of the pattern . In the context of nonlinear approxi-
mation, a criterion for determining the unknown parameters is sion algorithm is that the center selection procedure decomposes
to minimize the mean squared error between the class label and the optimal search problem into multiple subproblems. Combi-
the network output nation of these local optimal solutions may not be optimal from
a global sense [19].

III. RBF CENTER SELECTION BASED ON FISHER RATIO CLASS


SEPARABILITY MEASURE
The hidden layer of the RBF classifier can be viewed as a
function that maps input patterns from a -dimensional input
space to a -dimensional space
(3)
(6)
Equation (3) is a typical optimization problem. Gradient-based
nonlinear optimization can be used to solve this problem [1]. But
where
the gradient-based method may suffer from problems of being
trapped in local minima and having a long training time. An
efficient approach to RBF training is to adopt the orthogonal
forward regression algorithm [8], [9], which selects a subset of
In the new space, the responses of the hidden layer neurons,
training examples as RBF centers. Initially all the training
, form new feature vector for the
samples are considered as candidate centers,
representation of the th pattern . The discriminative power
. The response of the th hidden layer neuron with
of these new features is determined by the RBF centers. To
respect to the th input pattern is denoted by
achieve good classification, centers are best selected based on
their ability to provide large class separation. This is the basic
(4)
idea of our method, details are described later.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1213

A. Fisher Ratio for Class Separability Measure that provide large class separation using the forward selection
Two measures related to class separation include the intra- procedure. However, there is a problem when implementing this
class difference and the interclass spread. Let the mean and vari- idea. If , where is a small positive number,
ance of samples belonging to class and class in the direc- and would be severely correlated. As a result, redundant cen-
tion of the th feature be denoted by and , ters may be selected if the forward selection algorithm is used
respectively. Fisher ratio is defined as the ratio of the interclass without any modification. This may result in a large network
difference to the intraclass spread [13] architecture where the generalization capacity of the classifier
may deteriorate. In our study, we alleviate this problem by in-
(7) troducing the orthogonal decomposition into the procedure of
class separability evaluation. We define the hidden layer neuron
response matrix as
where denotes the class separation between classes and
in the direction of the th feature. Fisher ratio provides a good
class separability measure because it is maximized with the in- (11)
.. .. .. ..
terclass difference being maximized and the intraclass spread . . . .
being minimized.
The definition of Fisher ratio in (7) can be extended to multi-
where is the response of the th hidden layer neuron with
class case. In our study, class separability is evaluated using the
respect to the th input pattern , the components of
average value which is defined as
the th column are the responses of the th candidate neuron
corresponding with respect to all the training patterns
(8) . The components of the th row are the
responses of all candidate neurons to the th input pattern .
where is the average class separability measure in the direc-
Performing the orthogonal decomposition to the matrix , we
tion of the th feature, is the total number of classes.
obtain
Fisher ratio was originally proposed for feature selection in
linear separable problems. In the present study, Fisher ratio can (12)
be applied to nonlinear separable problems for RBF centers se-
lection. This is because the RBF mapping given in (6) maps where is an orthogonal matrix. Columns
patterns from the nonlinear separable space into the linear sep- define orthogonal directions, and components in each column
arable space. define distribution of samples along that direction.
Based on different criteria, the decomposition in (12) can
B. RBF Center Selection Using Fisher Ratio, Orthogonal yield different orthogonal transforms. Principal component
Transform, and Forward Selection Procedure analysis (PCA) provides orthogonal basis that leads to min-
We shall investigate into the role of the RBF centers in de- imum loss of information resulted from dimension reduction.
termining the class separability measure. In the new linear sep- But the problem with the PCA method is that the resultant
arable space, the mean value and the variance of the responses vector is a linear combination of columns of . Conse-
of the hidden layer neurons corresponding to the quently, cannot be linked back to a single column of .
new features are given by Rank revealing QR factorization and the orthogonal forward
regression algorithm [8], [9] are orthogonal transforms whose
individual column of can be linked back to individual column
of , and hence can be linked back to a hidden layer neuron.
The rank revealing QR factorization is based on the size of the
(9) norm of the orthogonal vectors . Columns of are sorted in
such an order that
(13)
where denotes the norm of .
(10) In the framework of approximation, the orthogonal forward
regression algorithm [8], [9] decomposes starting from the
following regression equation:
where denotes the number of training samples that fall into
class . (14)
Equations (9)–(10) reveal that center vectors play a crit- where
ical role in determining the mean and the variance and hence Substituting (12) into (14) yields
the interclass difference and the intraclass spread. Suppose the
training samples are representative, we can select a subset of the (15)
training samples as centers. The selection procedure consists of where
evaluating the class separability measure resulted from each of
the candidate centers using Fisher ratio, and selecting centers
1214 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

The orthogonal transform is formed by sorting largest class separability. Strictly speaking, exhaustive search is
, in the following order [8], [9]: the only method that guarantees to find the optimal subset. How-
ever, since the exhaustive search method has to evaluate all pos-
(16) sible combinations of all candidate centers available, it requires
an enormous amount of computation when the number of given
In the present study, we propose an orthogonal transform that
training samples is very large. Therefore the forward selection
sorts in the following order:
procedure, though suboptimal, is more frequently used than the
(17) exhaustive search. Also, despite of the suboptimality, the for-
ward selection procedure is capable of selecting a good combi-
where denotes class separability measure provided by . nation of a small number of centers because the center selected
is obtained by applying operations (7)–(8) to . The motivation at each step is the one that provides the largest class separability
of introducing such an orthogonal transform is to select centers measure. The forward selection procedure is then employed and
that provide maximum class separation. The RBF center selec- combined with the orthogonal transform in the present study.
tion procedure that combines the forward selection procedure, The orthogonal forward search procedure developed in the
the orthogonal transform and Fisher ratio class separability mea- present study has similarity to the orthogonal forward regres-
sure is summarized as follows. sion algorithms developed in [8] and [9]. Both algorithms em-
1) Initially, take all the training examples as candidate cen- ploy the orthogonal transform and the forward selection proce-
ters. Compute the responses of all candidate hidden layer dure. However, the two algorithms employ different evaluation
neurons with respect to all training input patterns. Form criteria in the center selection process. The orthogonal forward
matrix as in (11) using the responses of all hidden layer regression [8], [9] is developed in the context of nonlinear ap-
neurons. proximation, it evaluates candidate centers based on the approx-
2) In the second step, estimate sample mean and variance of imation error reduction. Our orthogonal forward selection algo-
each class along directions determined by each column of rithm is developed in the context of classification, it evaluates
and compute the class separability measure provided candidate centers based on Fisher ratio class separability mea-
by each column of using (7)–(8). The column that pro- sure. One advantage of our center selection algorithm is that it
vides maximum class separability is selected as the first can deal with heavily unbalanced classes because of the em-
column of matrix , and the corresponded neuron is se- ployment of Fisher ratio, it is therefore more suitable for clas-
lected as the first neuron to add to the network structure. sification tasks. As for the computational cost, our algorithm is
3) In the third step, orthogonalize all remaining columns similar to the orthogonal forward regression algorithm [8], [9].
of with all the columns of . Estimate sample mean
and variance of each class along directions determined by IV. EXPERIMENTS
each column of , compute the class separability mea-
sure provided by each of the orthogonalized columns. The One synthetic example and two real life problems from
one that yields maximum class separability is selected as UCI Repository of Machine Learning [20] and Knowledge
the second column of , and the corresponded neuron is Discovery in Databases Archive [21] were used to test our
selected to add to the network structure. algorithms.
4) The procedure is continued until the class separability
provided by the next selected neuron is smaller than a pre- A. Experiment 1
defined threshold. In the first experiment, a synthetic dataset was used to test
Once the hidden layer neurons are selected, the nonlinear sep- our algorithm. The dataset has 64 samples, where 40 examples
arable training patterns can be mapped into a linear separable fall into class 1 and the remaining 24 examples fall into class
space. The weights that connect the hidden and output layers 2. In our study, half of the examples in each class were used
can be determined using linear classifier design methods such for training and the remaining half were used for testing. As
as the linear least squares algorithm. shown in Fig. 1, the samples are nonlinear separable in the input
An important parameter in the center selection procedure is space. An RBF neural network classifier was used to deal with
the setting of the threshold for step 4). The value of the threshold the problem.
will determine the number of centers to be selected and thus To construct the RBF classifier, we need to select RBF centers
affect the network size and the performance of the classifier. in the first place. Initially, all the 32 training samples were taken
A small threshold will lead to a large network structure which as candidate centers. Following the center selection procedure
tends to overfitting and exhibits deteriorated generalization ca- summarized in Section III-B, we performed orthogonal decom-
pacity. On the other hand, a large threshold will result in a small position and evaluated the class separability provided by each
network structure which has the danger of underfitting. In our candidate neurons. The class separability measures provided by
implementation, the selection procedure is terminated when the individual candidate neurons are shown in Fig. 2. Inclusion of
class separability improvement from adding the next selected the third neuron would provide just 3.5% improvement to the
center is smaller than 5% of the sum of class separability of all class separability. Therefore, we selected two hidden layer neu-
previously selected neurons. rons in this experiment. The role of the two RBF neurons is to
Center selection is a subset search problem with the objec- map patterns from the nonlinear separable input space into the
tive of finding an optimal combination of centers that provide new linear separable feature space. The training patterns in the
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1215

Fig. 1. Sample distribution in the input space for Experiment 1. Fig. 4. Samples distribution in the new feature space (the algorithm in [8], [9])
for Experiment 1.

Fig. 2. Class separability measure yielded by individual candidate neurons for


Experiment 1. Fig. 5. Class separability measure provided by individual hidden layer neurons
for Experiment 2.

B. Experiment 2
Cleveland data on cardiology patient [20] was used in the
second experiment. The dataset has 303 samples, where each
example is represented by 13 input attributes and one output
attribute. Since the information provided is incomplete, the es-
timated class error rate is 20% [22].
Among the 303 samples, only 297 samples are complete and
the rest have one or more missing values. In our study, missing
values were replaced with the mean of the corresponding at-
tributes. The 303 samples were divided into a training dataset
and a test dataset consisting of 152 and 151 samples, respec-
Fig. 3. Sample distribution in the new feature space for Experiment 1.
tively. Only attributes 8, 9, 12, and 13 were selected as features
for pattern classification using the sequential forward feature se-
new feature space are shown in Fig. 3, where and denote lection method [23].
training data, and denote test data. Obviously patterns in Initially all the 152 training samples were considered as can-
the new space are linear separable. As a matter of fact, our clas- didate center vectors. Class separability measure provided by
sifier with just one hidden layer neuron is able to achieve 100% individual candidate hidden layer neurons was evaluated and
correct classification over both training and test datasets. part of the evaluations are shown in Fig. 5. Notice that only
For comparison, we also used the orthogonal forward regres- three neurons provide significant class separation. Classifica-
sion algorithm [8], [9] to select RBF centers. As shown in Fig. 4, tions using different number of hidden layer neurons were per-
samples in the new feature space are still inseparable if just two formed. As shown in Figs. 6 and 7, the classification error rate
neurons were used. To achieve 100% correct classification over reduces when more neurons are employed in the hidden layer.
both training and test datasets, three hidden layer neurons were However, the reduction of classification error is trivial when the
needed if the algorithm in [8], [9] was used to select RBF cen- number of neurons exceeds three. This number coincides with
ters. In this experiment, our algorithm selected more efficient that from the analysis based on class separability measure.
hidden layer neurons. This is achieved due to the employment The RBF neural network classifier is with a structure of
of the Fisher ratio class separability which can deal with the un- 4 3 1. Classification error rates over the training and test
balanced classes problem in this example. data are 13.2% and 19.7%, respectively. This result matches
1216 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

Fig. 6. Classification error rate versus number of hidden layer neurons for Fig. 8. Class separability measure provided by individual neurons for
training data of Experiment 2. Experiment 3.

Fig. 7. Classification error rate versus number of hidden layer neurons for test Fig. 9. Number of identified owners versus number of hidden layer neurons
data of Experiment 2. for Experiment 3.

TABLE I who would be interested in buying a caravan insurance policy


COMPARISONS WITH OTHER METHODS FOR EXPERIMENT 2 and to explain the reason. In CoIL 2000 Challenge, the first ob-
jective was approximated and simplified to finding a set of 800
customers in the test set of 4000 customers that contains the
most caravan policy owners. Only the first problem was consid-
ered in the present study.
Each customer is described by 85 input attributes and one
output attribute indicating whether or not they own caravan in-
surance policy. Many of the 85 input attributes provides little
information for discrimination. We selected 12 attributes as fea-
tures using the sequential forward feature selection method [23]
very well with the expected error rate of 20%, and is better in our study.
than that reported in [22] with an average error rate of 21.1%. Since there are over 5000 samples, extensive computation
To compare our method with methods reported in [24], we would be required if all training samples were considered as
performed a ten-fold cross validation. The 303 samples were candidate hidden layer neurons. In our study, the samples that
divided into ten subsets of approximately equal size. The RBF were used as candidate centers are the 348 caravan insurance
neural network was trained ten times, each time leaving out policy owners and 300 randomly sampled customers from the
one of the ten subsets from training, but using only the omitted 5474 nonowners. Class separability of candidate hidden layer
subset to compute classification accuracy. A comparison of our neurons are shown in Fig. 8. Obviously only six hidden layer
result with those reported in [24] is listed in Table I. Obviously, neurons provides good class separation and were selected to
our result is reasonably good. construct the classifier.
We then test our classifier using the 4000 testing samples.
C. Experiment 3 As required by CoIL 2000 Challenge, we need to select a set
In this experiment, we tested our algorithms using CoIL 2000 of 800 most possible caravan policy owners from the 4000 test
Insurance Company Benchmark [25]. The data was supplied by customers based on classification results using our algorithm.
the Dutch data mining company Sentient Machine Research and The number of correctly identified policy owners is shown in
was based on a real-world insurance problem. The training set Fig. 9. In our algorithm, six hidden layer neurons were selected
contains 5822 samples, where 348 samples are caravan insur- based on class separability measure, and this give rise to 118
ance owners and 5474 samples are nonowners. The test data set correctly identified caravan policy owners. Our result is good
contains 4000 samples. The objectives of the study are to predict compared with the results reported in [25], where the best result
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1217

[4] J. Moody and C. Darken, “Fast learning in network of locally-tuned pro-


cessing units,” Neural Comput., vol. 1, pp. 281–294, 1989.
[5] T. Kohonen, Self-Organizing Maps., Berlin, Germany: Springer-Verlag,
1995.
[6] W. Pedrycz, “Conditional fuzzy clustering in the design of radial basis
function neural networks,” IEEE Trans. Neural Networks, vol. 9, pp.
601–612, July 1998.
[7] Z. Uykan, C. Guzelis, M. E. Celebi, and H. N. Koivo, “Analysis of input-
output clustering for determining centers of RBFN,” IEEE Trans. Neural
Networks, vol. 11, pp. 851–858, July 2000.
[8] S. Chen, C. F. Cowan, and P. M. Grant, “Orthogonal least squares
learning algorithms for redial basis function networks,” IEEE Trans.
Neural Networks, vol. 2, pp. 302–609, 1991.
[9] S. Chen, S. A. Billings, and P. M. Grant, “Recursive hybrid algorithm
Fig. 10. Results of other solutions for Experiment 3. for nonlinear system identification using radial basis function networks,”
Int. J. Contr., vol. 55, no. 5, pp. 1051–1070, 1992.
[10] J. B. Gomm and D. L. Yu, “Selecting radial basis function network
correctly identified 121 policy owners and the second best result centers with recursive orthogonal least squares training,” IEEE Trans.
identified 115 policy owners as shown in Fig. 10. Neural Networks, vol. 11, pp. 306–314, Mar. 2000.
[11] R. Schokopf, K. K. Sung, C. J. C. Burges, F. Girosi, P. Niyogi, T. Poggio,
In this experiment, the samples that were used as candidate and V. N. Vapnik, “Comparing support vector machines with Gaussian
centers include all the 348 policy owners and 300 randomly se- kernels to radial basis function classifiers,” IEEE Trans. Signal Pro-
lected customers from the 5474 nonowners. Since random selec- cessing, vol. 45, pp. 2758–2765, Nov. 1997.
tion was involved, we repeatedly performed RBF center selec- [12] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans.
Neural Networks, vol. 10, pp. 988–999, Sept. 1999.
tion using different candidate center combinations. The results [13] R. O. Duda and P. E. Hart, Pattern Classification and Scene Anal-
obtained were similar to those reported above. This again vali- ysis. New York: Wiley, 1973.
dates the effectiveness of our method. [14] C. S. Cruz and J. Dorronsoro, “A nonlinear discriminant algorithm
for feature extraction and data classification,” IEEE Trans. Neural
Networks, vol. 9, pp. 1370–1376, 1998.
V. CONCLUDING REMARKS [15] S. Haykin, Neural Networks: A Comprehensive Foundation. New
York: Macmillan, 1994.
In this study, we have presented a new algorithm for RBF [16] S. Chen, Y. Wu, and B. L. Luk, “Combined genetic algorithm opti-
center selection. The basic idea of our algorithm is to select RBF mization and regularized orthogonal least squares learning for radial
centers based on Fisher ratio class separability measure and to basis function networks,” IEEE Trans. Neural Networks, vol. 10, pp.
1239–1243, Sept. 1999.
employ orthogonal transform to facilitate the center selection
[17] R. F. Gunst, Regression Analysis and Its Applications: A Data-Oriented
procedure. The algorithm has been tested using one synthetic Approach. New York: Marcel Dekker, 1980.
dataset and two real-life problems. The results have shown that [18] A. Sherstinsky and R. W. Picard, “On the efficiency of the orthogonal
our algorithm can always find a parsimonious network structure, least squares training method for radial basis function networks,” IEEE
Trans. Neural Networks, vol. 7, pp. 195–200, Jan. 1996.
and is capable of finding hidden layer neurons that yield large [19] K. Z. Mao and S. A. Billings, “Algorithms for minimal model structure
class separation. detection in nonlinear dynamic system identification,” Int. J. Contr., vol.
68, no. 2, pp. 311–330, 1997.
[20] C. L. Blake and C. J. Merz. (1998) UCI Repository of Machine Learning
ACKNOWLEDGMENT Databases. Univ. California, Dept. Inform. Comput. Sci., Irvine,
The author would like to thank Dr. K.-A. Toh for helpful dis- CA. [Online]. Available: http://www.ics.uci.edu/mlearn/Machine-
Learning.html
cussions. The author would also like to thank the anonymous [21] S. D. Bay. (1999) The UCI KDD Archive. Univ. California, Dept.
reviewers for constructive comments to improve the paper. Inform. Comput. Sci., Irvine, CA. [Online]. Available: http://kdd.ics.
uci.edu
[22] J. H. Gennari, P. Langley, and D. Fisher, “Models of incremental concept
REFERENCES
information,” Artificial Intell., vol. 40, no. 1, pp. 11–61, 1992.
[1] T. Poggio and F. Girosi, “Networks for approximation and learning,” [23] P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Ap-
Proc. IEEE, vol. 78, pp. 1481–1497, Sept. 1990. proach.. London, U.K.: Prentice-Hall, 1982.
[2] S. Elanayar and Y. C. Shin, “Radial basis function neural network for [24] Datasets Used for Classification: Comparison of Results: Cleveland
approximation and estimation of nonlinear stochastic dynamic systems,” Heart Disease, N. Jankowski. [Online]. Available: http://www.phys.uni.
IEEE Trans. Neural Networks, vol. 5, pp. 594–603, July 1994. torun.pl/kmk/projects/datasets.html
[3] W. Kaminski and P. Strumillo, “Kernel orthogonalization in radial basis [25] P. van der Putten and M. van Someren, Eds., CoIL Challenge 2000:
function neural networks,” IEEE Trans. Neural Networks, vol. 8, pp. The Insurance Company Case. Amsterdam, The Netherlands: Sentient
1177–1183, Sept. 1997. Machine Res. Publication, 2000.

You might also like