Professional Documents
Culture Documents
Neighbor (ENN) method to solve this class 1 and class 2, respectively, x denotes i, j = 1, 2 (4)
deficiency in the classic KNN method.The one single sample in S = S 1 , S 2, n i is the
advantages of the original KNN approach number of samples in S i, and k is the user- where n 'i is the size of S li, j and S li, j is
are retained in our ENN classifier, such as defined parameter of the number of the defined as
easy implementation and competitive classi- nearest neighbors. The indicator function
S li, j = '
S i , {Z}, when j = i
fication performance. However, unlike the I r (x, S) indicates whether both the sample (5)
S i, when j ! i
classic KNN approach, the new ENN classi- x and its r-th nearest neighbor belong to
fier makes a prediction by not only considering the same class, defined as follows: For this two-class classification prob-
who are the nearest neighbors of the test sample, lem, we have four generalized class-wise
but also who consider the test sample as their I r ^ x, S h = statistics: T 11, T 12, T 21 and T 22 . Unlike the
'
nearest neighbors. 1, if x ! S i and NN r (x, S) ! S i previous KNN classifiers which predict
(2)
0, otherwise that the unknown sample Z belongs to
A. ENN Classifier a class based only on its nearest neigh-
We describe our ENN method starting where NN r (x, S) denotes the r-th near- bors, the ENN classifier predicts its class
with the two-class classification problem. est neighbor of x in S . This equation membership according to the following
The generalized statistic based on nearest means for either class, if both the sample target function:
neighbors was first proposed in the two- x and its r-th nearest neighbor in the 2
sample test problem to evaluate whether pool of S belong to the same class, then fENN = arg max / T ij = arg max H j
j ! 1, 2 i =1 j ! 1, 2
two distributions are mixed well or wide- the outcome of the indicator function (6)
decision resulting the largest intra-class from class 2, where k 1 + k 2 = k is the / I r (X, Sl=S 1 , S 2 , {Z})
r =1
coherence when we iteratively assume the total number of nearest neighbors inves-
1 /
(n i + 1) k ;X ! Si
test sample Z to be each possible class. tigated. First, lets assume the observation =
We now present the detailed calcula- Z to be classified as class 1; then, we k
tion steps for a two-class classification count the number of class 1 data who / I r (X, S 1 , S 2 , {Z})
problem using our ENN method. First, have an increased number of class 1 sam- r =1
+ / I r (Z, S 1 , S 2 , {Z})G
k
lets assume that the observation Z should ples in its k nearest neighbors (denote
be classified as class 1.Then, for each train- this number as Dn 11 ), and we also count r =1
ing sample, we re-calculate its k nearest the number of class 2 data who have a 1 /
(n i + 1) k ;X ! Si
=
neighbors and obtain the new generalized decreased number of class 2 samples in its
class-wise statistic according to Eq. (4), k nearest neighbors (denote this number k
denoted as T 11 for class 1 and T 12 for class as Dn 12 ). Then, in a similar way, we fur-
/ I r (X, S 1 , S 2) + Dn ij
r =1
2. Second, we assume that the observation ther assume the observation Z to be a
+ / I r (Z, S 1 , S 2)G
k
Z should be classified as class 2; then, for member of class2, and count the num-
r =1
each training sample, we also re-calculate ber of class 1 data who have a decreased
its k nearest neighbors and obtain the new number of class 1 samples in its k nearest = 1 ^ n kT + Dn ij + k i h (10)
(n i + 1 ) k i i
generalized class-wise statistic according to neighbors (denote this number as Dn 21 ),
and when i ! j
Eq. (4), denoted as T 21 for class 1 and T 22 and we also count the number of class 2
k
for class 2. Then a classification decision is data who have an increased number of T ij = 1 / / I r (X, S l =S 1 , S 2 ,{Z})
made according to Eq. (6). class 2 samples in its k nearest neighbors nli k X ! Si r = 1
(denote this number as Dn 22 ). In this way,
= 1 = / / I r (X, S 1 , S 2) - Dn i G
k
The aforementioned two-class ENN j
decision rule can be easily extended to Dn ij represents how many of the k near- n i k X ! Si r = 1
multi-class classification problems. Specifi- est neighbors from each class will change
= 1 ^n i kTi - Dn ij h
cally, for an N-class classification problem, because of the introduction of the new ni k
we have an ENN Classifier algorithm sample Z , when it is iteratively assumed Dn ij
(see the box below). to be each possible class. To clearly dem- = Ti - (11)
ni k
onstrate this concept, we present a de- Therefore, from Eq. (8), we have:
B. ENN.V1 Classifier: An Equivalent tailed calculation example in our N
Version of ENN Supplementary Material Section 1 for a fENN = arg max / T ij
j ! 1, 2, g, N i = 1
The underlying idea of the ENN rule two-class classification problem.
= arg max / ^T ij - Ti h
N
shown in Eq. (8) is that we classify the With these discussions, we present an
test sample Z based on which decision equivalent version of the ENN classifier j ! 1, 2, g, N i = 1
of Z in our ENN rule depends not only to obtain new generalized class-wise sta- i!j
ENN: Consider 3-Nearest Neighbors (k = 3) NN1,2,3 (X4) = [X2, X1, Y4] [X2, X1, Z]
NN1,2,3 (Y3) = [Y2, Y4, Y1] [Y2, Z, Y4]
f2 Y2
: Class 1 Class 1: T11 = 0.800
Y1 Class 2: T21 = 0.500
: Class 2 Y3
: Test Sample Z X1 X3
Y4 Class 1: Dn11 = 1, k1 = 2
X4
Class 2: Dn21 = 1, k2 = 1
f2 X2
Y2
f1
Y1 (B)
Y3
Z X1 X3
Y4
X4
X2 ENN: Assume Test Sample Z to Be Class 2 Assume
Z Class 2
f1 NN1,2,3 (Y1) = [X1, Y2, X3] [X1, Y2, Z]
(A) NN1,2,3 (Y2) = [Y1, Y3, X1] [Y1, Y3, Z]
NN1,2,3 (Y4) = [X4, Y3, X2] [Z, X4, Y3]
3-Nearest Neighbors Without Z: f2
Y2
NN1,2,3 (X1) = [X2, X3, Y1] NN1,2,3 (Y1) = [X1, Y2, X3] Class 1: T12 = 0.750
Y3 Y1
NN1,2,3 (X2) = [X1, X4, X3] NN1,2,3 (Y2) = [Y1, Y3, X1]
Class 2: T22 = 0.733
NN1,2,3 (X3) = [X1, X2, Y1] NN1,2,3 (Y3) = [Y2, Y4, Y1] Z
X1 X3
NN1,2,3 (X4) = [X2, X1, Y4] NN1,2,3 (Y4) = [X4, Y3, X2]
X4 Class 1: Dn12 = 0, k1 = 2
Y4
Class 1: T1 = 0.750 Class 2: T2 = 0.583 X2 Class 2: Dn22 = 3, k2 = 1
f1
(C)
According to Eq: (8): fENN = arg max T11 + T21 = 1.300, T12 + T22 = 1.483 Assign
j e 1, 2 Class 2
j=1 j=2
to Z
Dn11 + k1 - kT1 Dn21 Dn22 + k2 - kT2 Dn12
According to Eq: (9): fENN.V1 = arg max - =- 0.03, - = 0.15
(n1 + 1)k n2k (n2 + 1)k n1k
j e 1, 2
j=1 j=2
57
j
class 2. Both of them provide the same classification result, but the recalculation of generalized class-wise statistic T i is avoided in ENN.V1.
Model 1: v12 = 5, v22 = 5, v32 = 5 4
Model 2: v12 = 5, v22 = 20, v32 = 5
Overall Classification
Error (in Percentage)
3.8 KNN
KNN ENN
Overall Classification
Error (in Percentage)
Overall Classification
Error (in Percentage)
34 KNN 3.6
ENN
34 ENN 3.4
32 3.2
32 3
30 2.8
30 2.6
28
28 2.4
5 10 15 20
5 10 15 20 5 10 15 20 k
k k
Figure 4 Overall classification error rate (in
Model 3: v12 = 5, v22 = 5, v32 = 20 Model 4: v12 = 5, v22 = 20, v32 = 20 percentage) of ENN and KNN for handwrit-
36
KNN 44 ten digits classification.
KNN
Overall Classification
Error (in Percentage)
Overall Classification
34 ENN Error (in Percentage) 42 ENN
32 IV. Experiments and
40
30 Results Analysis
38 To evaluate the performance of our
28
36 ENN classifier, we conduct several
26
34 experiments with Gaussian data classifi-
5 10 15 20 5 10 15 20 cation, hand-written digits classification
k k
[34], and twenty real-world datasets
from UCI Machine Learning Reposito-
Figure 3 Overall classification error rate (in percentage) of ENN and KNN for four Gaussian
data models. ry [35]. In all these experiments, we
explore every odd number of k from 1
KNN method, there is no training stage obtain k i for each class at a computa- to 21, i.e., k = 1, 3, 5, 7, ..., 21.
(i.e., the so-called instance-based learn- tional complexity of O (M log M) . We first test our proposed ENN clas-
ing or lazy learning), with a computa- After that, we compare this distance sifier for a 3-dimensional Gaussian data
tional complexity of O (M log M) in a within the weighted KNN graphs to with 3 classes, in comparison with the
testing stage for every test sample, where obtain Dn ij with only computational classic KNN rule and the Maximum a
M is the number of training data. To complexity of O ^M h . Therefore, the Posterior (MAP) rule. In the MAP rule,
improve the efficiency, it makes more total computational complexity of our we estimate the parameters of Gaussian
sense to take a preprocessing stage proposed ENN method is distribution for each class with the given
where data structures are built to con- O (M log M) + O (M) , which is on the training data. In our current experi-
struct the relationship among training same scale as O (M log M) in the KNN ments, we use 250 training samples and
data, such as k-d tree [24], ball tree [27], method. Notice that we have computa- 250 test samples for each class with the
and nearest feature line [25], or to tional complexity of O (M 2 log M) in a following parameters:
reduce the size of training data by elimi- preprocessing stage of building the
nating the data which may not provide weighted KNN graphs, if we perform n1 = [8 5 5] T C 1 = v 21 I
T 2
useful information for decision making, it in a direct manner (i.e., for each n 2 = [ 5 8 5] C 2 = v 2 I
such as the condensednearest neighbor training sample, we calculate and order T
n 3 = [5 5 8] C 3 = v 3 I
2
rule by using a concept of mutual near- the mutual distances to all the other
est neighborhood to only select samples training data to search its nearest neigh- To demonstrate the effectiveness of the
close to the decision boundary [22]. bors). Fortunately, the existing tech- proposed ENN classifier to solve the defi-
In our ENN method, a preprocess- niques proposed in literature to ciency of the KNN rule, we examine the
ing stage can be developed to calculate improve the efficiency of KNN based classification error rate of each class with
the generalized class-wise statistic Ti methods can be easily integrated into different variances. Fig. 3 shows the overall
for each class to build weighted KNN our ENN method to speed up the classification error rates (in percentage)
graphs, in which training samples are searching of nearest neighbors, and averaged over 100 random runs. The
the vertices and the distances of the hence reduce the complexity of the detailed classification performance for each
sample to its nearest neighbors are the ENN method. For example, using the class among these three comparative
edges. The weighted KNN graphs can structure of k-d trees [24] can reduce methods, i.e., ENN, KNN, and MAP, is
then be used to calculate Dn ij efficient- the computational complexity of the provided in Supplementary Material Sec-
ly for a given test sample. To do so, we preprocessing stage and the testing tion 2. Our results show that the proposed
calculate distances between the test stage to O (M log M) and O (log M) , ENN method performs much better than
sample and every training data to respectively. the classic KNN method.
We also evaluate the classification p = 0.01h . In the neural network please refer to the Supplementary Mate-
performance of our proposed ENN implementation, we use the classic mul- rial Section 3 for further information.
classifier on the entire MNIST hand- tilayer perceptron (MLP) structure with We would like to note that for all
written digits dataset [34], which is a 10 hidden neurons, and with 800 back- these experiments, there appears to be
widely used benchmark in the commu- propagation iterations for training at the no significant difference between ENN
nity. In this experiment, we use 60,000 learning rate of 0.01. The results are and KNN when k = 1. The reason for
images as training data and 10,000 averaged over 100 random runs, and in this might be that under such a small
images as test data. Fig. 4 shows the every run, we randomly select half of value of k = 1, the classification perfor-
comparison of classification error rates the data as the training data and the mance will be determined by the single
(in percentage) between classic KNN remaining half as the test data. Lets con- closest neighbor. Therefore, when k = 1,
rule and our ENN rule, where the min- sider the Spam database as an example. both methods do not consider the data
imum error of 2.61% is obtained at This database includes 4601 e-mail mes- distribution anyway. That might explain
k = 7 for our ENN method. The overall sages, in which 2788 are legitimate mes- why under k = 1, both methods show a
classification error rates using the ENN sages and 1813 are spam messages. Each very close performance.
rule are notably less than those with the message is represented by 57 attributes,
KNN rule (Fig. 4). of which 48 are the frequency of a par-
We further apply our ENN classifier ticular word (FW), 6 are based on the
to twenty real world datasets from UCI frequency of a particular character (FC), 13.5
Overall Classification
Error (in Percentage)
13 KNN
Machine Learning Repository [35]. and 3 are continuous attributes that 12.5 ENN
Table I presents the classification error reflect the use of capital letters (SCL) in 12
11.5
rates (in percentage) for these twenty the e-mails. Fig. 5 shows detailed classifi- 11
UCI datasets in comparison to the clas- cation error rates (in percentage) with 10.5
sic KNN, naive Bayes, linear discrimi- different parameters of k , which clearly 10
9.5
nant analysis (LDA), and neural network. demonstrates that ENN method can 5 10 15 20
It shows that ENN always performs bet- achieve consistently lower error rates k
ter than KNN, and in 17 out of these 20 than those of KNN rule. For a detailed
Figure 5 Overall classification error rate (in
datasets, the performance improvement performance comparison between ENN percentage) of ENN and KNN for spam
is significant (one-tailed t -test, and KNN for all these twenty datasets, e-mail classification.