Professional Documents
Culture Documents
ABSTRACT
Twin support vector machine (TWSVM) is an important machine learning method, whose objective is to construct two nonparallel
hyperplanes such that each hyperplane is closer to one of two classes and as far as possible from the other class. As TWSVM solves
two smaller size quadratic programming problems (QPPs), it works faster than standard support vector machine. However, this
method does not consider importance of different data sample, and solution of QPPs still uses the Lagrange multiplier method. In
this paper, we study two fuzzy twin support vector machines, which are FTWSVM and v-FTWSVM respectively, using successive
over-relaxation (SOR) iterative method. Experiments are conducted for both FTWSVM and v-FTWSVM on some UCI datasets. The
results indicate that the speed and accuracy of using successive over-relaxation iterative method for fuzzy twin support vector
machine are superior to those of using the traditional solution method.
Keywords:Twin support vector machine;fuzzy twin support vector machine;successive over-relaxation; Lagrange multiplier
1. INTRODUTION
Support vector machine is a machine learning method proposed by Vapnik et al., which is based on the theory of VC
(Vapnik-Chervonenkis) dimension and structural risk minimization principle. Now, it has become a hot topic in machine
learning and is widely applied in many fields. We know that support vector machine is regarded as an optimization
problem, which could be achieved by solving the Lagrange multiplier about quadratic programming problem. However, this
method needs high computational complexity in solving the quadratic programming problem when facing large amounts of
data. To solving them, researchers have presented many efficient learning algorithms and models[1][2]. Recently, Jayadeva
et al. [3] proposed twin support vector machine (TWSVM) which constructs two non-parallel hyperplanes by resolving two
smaller convex quadratic programming problem. Shao et al[4] proposed an improved twin support vector machine by
introducing regularization term in TWSVM. Peng[5] proposed a v-Twin Support Vector Machine (v-TSVM). Afterwards,
researchers have given some reviews about twin support vector machine[6]-[8]. After all, no matter TWSVM or v-TSVM,
they did not consider the influence of different samples on the optimal classification hyperplanes. That is to say, these
methods treat noises as the normal data points. Under this circumstance, we considered the influence of different samples
on the optimal classification hyperplanes aiming at v-TSVM and proposed a fuzzy twin support vector machine[9].
Ding[10] proposed a fuzzy twin support vector machine aiming at TWSVM.
It may be seen that whether it is traditional support vector machine or twin support vector machine described above, a
quadratic programming problem needs to be solved in order to obtain the Lagrange multiplier. In this paper, we study the
iterative method based fuzzy twin support vector machine in order to reduce training time of fuzzy twin support machine.
The rest of this paper is organized as follows. Section 2 introduces twin support vector machine which include TWSVM and
v-TSVM. Fuzzy twin support vector machines and its iterative method are introduced in section 3. Some experiments are
conducted in section 4. Conclusion is drawn in section 5.
2. TWIN SUPPORT VECTOR MACHINE
Assuming sample set T={(xi ,yi )| i=1,...,m}, where xi Rn and
y i { 1, 1} (i=1,...,m). Matrices
A R m1 n and
B R m2 n denote all of the data points in class +1 and class -1, respectively, and m1 m 2 m .
T
T
Twin support vector machine seeks two non-parallel hyperplanesw1
x b(1) 0 and w2
x b(2) 0 by solving two
smaller quadratic programming problems such that each hyperplane is closer to the data points of one class and far from the
data points of the other class, where w(1) R n , w(2) R n , b(1) R and b(2) R . The primal problem of twin support vector
machine is expressed as
1
( Aw(1) e1b (1) )T ( Aw(1) e1b (1) ) c1e2T
w , b ,
2
,
s.t.
( Bw(1) e2b (1) ) e2 , 0
1
( Bw( 2 ) e2b ( 2) )T ( Bw (2 ) e2 b ( 2 ) ) c2 e1T
min
2
s.t.
( Aw( 2) e1b (2 ) )T e1 , 0
min
( 1)
(2)
(1)
(1)
,b
(2)
(2)
Page 10
where c1 and c2 are positive parameters and e1 and e2 are vectors of ones of appropriate dimensions. The dual problem of (1)
is as follows
1 T
G ( H T H )1 G T ,
2
s.t . 0 c1
max e2 T
where H=[A e1],G=[B e2] and let u=[w(1), b(1)]T. With simple algebra computation, u=-(HTH)-1GT is obtained. It is seen
that although HTH is always positive semidefinite, it may not be well conditioned in some situations. Thus a regularization
term I is introduced to take care of problems due to possible ill-conditioning of HTH. Here, I is an identity matrix of
appropriate dimensions. Therefore, the augmented vector u is given by u=-(HTH+I)-1GT.
Similarly, the dual problem of (2) is also obtained in the following
1 T
H ( G T G ) 1 H T
2
s.t . 0 c 2
max
e1T
where H=[A e1],G=[B e2] and let v=[w(2),b(2)]T. The augmented vector v is given by v=-(GTG+I)-1HT.
For nonlinear classification, Jayadeva et al.[3] expended their results by introducing kernel. Thus, the optimization problem
is as follows
1
|| K ( A , C T ) u (1) e1 b (1) ||2 c1 e 2 T
min
2
s .t . ( K ( B , C T ) u (1) e 2 b (1) ) T e 2 , 0
1
|| K ( B , C T ) u ( 2 ) e 2 b ( 2 ) ||2 c 2 e1T
,
min
2
s .t .
( K ( A , C T ) u ( 2 ) e1b ( 2 ) ) T e1 , 0
Correspondingly, the dual problem is
1 T
T
T
1
T
max e 2 G ( H H ) G
2
s.t . 0 c1
.
1 T
T
T
1
T
max e1 H ( G G ) H
2
s.t . 0 c 2
u
( 1)
(2)
,b
,b
(1 )
(2)
(1)
( B w (1) e 2 b (1) ) 1 2 , 2 0, 1 0
s .t .
m in
w
(2)
,b
(2)
s .t .
, 1 , 2
1
1
|| B w ( 2 ) e 2 b ( 2 ) || 2 v 2 2 e1T 1 .
2
l1
( Aw ( 2 ) e1b ( 2 ) ) 2 1 , 1 0, 2 0
It is seen that v-twin support vector machine introduces the parameters v1 and v2, which can control support vector and
spacing error and v-TSVM also adds two variables 1 and 2. Its dual problem is
1
T G ( H T H ) 1 G T
max
2
1
s.t. 0 , e2T v1
l2
1
T H (G T G ) 1 H T ,
max
2
1
s.t. 0 , e1T v2
l1
where H=[A e1] G=[B e2].
Page 11
When the samples are nonlinear separable, the kernel-generated surfaces are considered instead of planes. The kernelgenerated surfaces are K (xT, CT) u (1) + b (1) = 0 and K (xT, CT) u (2) + b (2) = 0. K is an appropriately chosen kernel. Thus, the
optimization problem is as follows
1
1
|| K ( A, C T )u (1) e1b (1) ||2 v1 1 e2 T 2
2
l2
min
u
(1)
,b
(1)
, 2 , 1
min
u
(2)
,b
(2)
,1 , 2
( K ( A, C T )u ( 2 ) e1b ( 2) ) 2 1 ,1 0, 2 0
s.t .
s .t .
1
T R ( S T S ) 1 R T
2
1
0 , e2 T v1
l2
1
T S ( R T R ) 1 S T
max
2
1
s.t . 0 , e1T v 2
l1
1
(1) T
(1)
( Aw(1) eb
) ( Aw(1) eb
) c1SAe2T
1
1
,
2
(1)
(1) T
s.t. (Bw e2b ) e2 , 0
1
( Bw(2) e2b(2) )T ( Bw(2) e2b(2) ) c2SBe1T
min
w ,b ,
2
,
s.t. ( Aw(2) e1b(2) )T e1, 0
min
(1 )
(1)
w , b ,
(2)
(2)
where SA and SB denote fuzzy membership of the sample points, c1SA and c2SB in the objective function determined the
importance of training support vector machine. Their dual problems are written as
1 T
T
T
1 T
max e2 G ( H H ) G
2
,
s.t. 0 c1S A
1 T
T
T
1
T
max e1 H (G G ) H
2
.
s.t. 0 c2 S B
When the samples are nonlinear separable, the kernel matrix K (xT, CT) =(xT) (CT) is introduced. The kernel-generated
surfaces are K (xT, CT) u (1) + b (1) = 0 and K (xT, CT) u (2) + b (2) = 0. Where CT = [A B] T, K (x, y) is the kernel function. The
optimization problems of solving the classification surface are
1
|| K ( A, CT )u (1) e1b(1) ||2 c1S Ae2T
min
2
s.t. ( K ( B, CT )u (1) e2b(1) )T e2 , 0
u
( 1)
,b
( 1)
min
u
( 2)
,b
( 2)
s.t.
1
|| K ( B, C T )u (2) e2b(2) ||2 c2 S B e1T
2
( K ( A, C T )u (2) e1b(2) )T e1 , 0
Page 12
1 T
R ( S T S )1 R T ,
2
s.t . 0 c1 S A
1 T
T
T
1 T
max e1 S ( R R ) S .
2
s .t . 0 c 2 S B
T
max e 2
As the parameters c1 and c2 can only control the empirical risk, the parameters v1 and v2 are introduced. And they can
control support vectors and spacing errors.
3.2 Fuzzy v twin support vector machine v-FTWSVM
For v-FTWSVM, the optimization problem is
1
1
|| Aw(1) e1b(1) ||2 v11 SBT 2
min
w , b , ,
2
l2
(1)
(1)
,
s.t. ( Bw e b )
(1)
(1)
2 0, 1 0
1
1
|| Bw(2) e2b(2) ||2 v2 2 S AT 1
2
l1
min
(2)
,b
( 2)
, 1 , 2
s.t.
( Aw(2) e1b(2) ) 2 1
1 0, 2 0
Correspondingly, the dual problem is
1
T G ( H T H )1 GT
2
,
SB T
s.t. 0
, e2 v1
l2
1 T
T
1 T
max H (G G) H
2
S
.
s.t. 0 A , e1T v2
l1
When the data samples are nonlinear separable, the kernel-generated surfaces are K (xT, CT) u (1) + b (1) = 0 and K (xT, CT) u
(2)
+ b (2) = 0. Where CT = [A B]T, K (x, y) is the kernel function. The optimization problem of solving the classification
surface is
1
1
|| K ( A, C T )u (1) e1b(1) ||2 v1 1 S BT 2 ,
min
2
l2
max
( 1)
,b
(1 )
(2)
,b
(2)
s .t .
, 1 , 2
1
1
|| K ( B , C T ) u ( 2 ) e 2 b ( 2 ) ||2 v 2 2 S A T 1 .
2
l1
( K ( A , C T ) u ( 2 ) e1b ( 2 ) ) 2 1 , 1 0, 2 0
s.t.
max
s.t .
1
T R ( S T S ) 1 R T
2
,
S
0 B , e2T v1
l2
1
T S ( RT R) 1 S T
2
S
,
0 A , e1T v2
l1
Page 13
The above dual quadratic programming problems based on FTWSVM can be transformed into following form
min
1
2
P e
s.t. S { | 0 Ce}
T
P is decomposed into L+E+L , where the nonzero elements of constitute the strictly lower triangular part of the symmetric
matrix P , and the nonzero elements of ERmm constitute the diagonal of P. Thus, the Lagrange multipliers can be obtained
using iterative method:
i 1 ( i wE 1 ( P i e L ( i 1 i ))) # ,
In solving , first choose w(0,2) to ensure convergence. Then start with any 0Rm, compute until ||i+1-i || is less than
some prescribed tolerance , where (.)# denotes the 2-norm projection on the feasible region.
if i 0
0
(( )# )i i
if 0 i c1 ( S A )i , i 1,..., m2
c ( S )
if i c1 ( S A )i
1 A i
if i 0
0
(( ) # )i i
if 0 i c2 ( S B )i , i 1,..., m1
c ( S )
if i c2 ( S B )i
2 B i
Correspondingly, the above dual problems based on v-FTWSVM can be transformed into following form
1 T
P
min
2
s.t. S { | 0 Ce}
Here, we use the following formula to solve the Lagrange multipliers
i 1 ( i wE 1 ( P i L ( i 1 i )))# .
The value range of Lagrange multiplier is
v
1
(( )# )i i
(SB )i
l2
,
(SB )i
if v1 i
, i 1,..., m2
l2
(S )
if i B i
l2
if i v1
v
if
v
i
2
2
(SA )i
(( )# )i i if v2 i
,
i
1,...,
m
1
l1
(SA )i
(S )
if i A i
l1
l1
4. EXPERIMENT
4.1 Experimental data and methods
In order to verify the performance of fuzzy twin support vector machine based on iterative method, we choose 10 datasets
from UCI database to conduct some experiments, as seen in Tab. 1. In experiment, tenfold cross-validation method is used
and the kernel functions chosen are linear kernel and Gaussian kernel, respectively. For the convenience of comparison, we
also select TWSVM, FTWSVM and v-FTWSVM to experiment.
Table 1: Characteristic of data sets
Datasets
heart-statlog
sonar
ionosphere
banknote_authenticatio
Instance
s
270
208
351
1372
Attribute
s
14
60
35
6
Page 14
768
683
506
440
310
197
9
11
15
9
7
24
TWSVMS
heart-statlog
84.446.80
sonar
77.006.10
ionosphere
88.485.74
banknote_authenticatio
n
97.740.23
diabetes
76.032.45
breast_cancer
95.152.11
housing
78.312.55
wholesale_customers
86.362.93
vertebral_column
76.131.58
parkinsons
82.833.50
Datasets
TWSVMS
TWSVM
FTWSVMS
heart-statlog
84.812.00
84.444.32
85.923.60
sonar
77.263.40
77.2610.1
0
75.513.10
ionosphere
74.653.26
88.032.81
78.044.25
banknote_authenticatio
n
97.410.32
97.300.47
97.740.78
diabetes
59.804.51
64.972.52
64.657.54
breast_cancer
65.004.15
65.034.72
65.327.05
housing
84.182.08
84.141.62
84.183.75
wholesale_customers
83.860.68
83.682.45
86.090.46
vertebral_column
79.031.29
78.062.00
80.961.93
parkinsons
75.336.01
74.836.33
74.161.96
TWSVM
82.596.2
1
75.516.3
0
74.404.2
0
97.160.2
4
65.101.0
8
94.722.4
7
78.348.0
6
81.360.4
6
76.078.0
6
88.501.1
6
FTWSVMS
84.813.40
77.362.30
76.453.45
97.810.30
77.065.18
95.613.92
79.180.73
83.185.22
78.392.91
89.163.34
FTWSVM
84.442.1
2
76.415.2
1
74.382.6
3
97.380.8
9
73.044.1
5
95.471.4
9
85.400.8
5
82.271.2
7
76.453.8
7
89.161.0
0
v-FTWSVMS
vFTWSVM
84.072.22
84.072.22
77.516.50
76.944.30
77.212.71
74.643.10
97.820.25
96.870.58
76.063.45
75.922.03
95.182.73
94.722.08
78.782.53
85.540.75
89.403.35
89.320.68
77.746.45
75.806.27
83.501.67
83.162.16
v-FTWSVMS
vFTWSVM
86.294.32
85.564.45
77.464.50
77.463.14
78.383.42
74.964.50
97.740.56
97.380.67
76.185.48
65.093.16
65.067.64
64.996.07
81.246.73
81.256.50
85.221.36
84.311.14
80.002.26
79.354.21
72.672.30
72.501.00
FTWSVM
85.562.8
4
75.512.3
0
74.933.6
0
97.530.6
8
64.434.0
7
65.045.8
4
83.611.0
3
85.911.6
0
80.642.0
0
79.161.5
0
Page 15
TWSVMS
heart-statlog
82.964.67
sonar
89.646.11
ionosphere
87.463.40
banknote_authenticatio
n
99.050.10
diabetes
76.065.01
breast_cancer
64.275.08
housing
79.282.07
wholesale_customers
87.951.82
vertebral_column
78.711.30
parkinsons
75.331.00
TWSVM
FTWSVMS
84.442.1
3
80.323.2
2
86.737.4
0
97.670.2
1
65.123.0
8
64.024.5
6
79.236.2
6
83.860.9
1
78.323.6
1
85.671.1
7
85.724.32
82.363.67
93.162.12
98.100.37
77.604.82
65.725.81
85.391.73
87.720.68
80.351.65
85.505.83
FTWSVM
v-FTWSVMS
vFTWSVM
85.184.51
84.814.32
78.273.46
76.462.13
92.573.20
92.321.31
99.050.10
98.250.09
74.975.06
65.14.05
65.716.25
65.64.17
78.462.37
77.281.74
87.270.72
84.090.91
81.292.62
78.060.97
77.006.33
76.831.01
85.653.4
4
80.612.4
0
92.583.1
0
97.890.1
4
77.224.5
1
65.025.8
1
80.755.2
3
83.640.6
8
80.001.2
9
85.501.1
6
From the results of Tab. 2 to Tab. 4, we can see that whether the data set is linear separable or nonlinear separable, the
accuracy using fuzzy twin support vector machine based on iterative method has increased basically. For example, for
ionosphere in the Tab. 4 the accuracy of the FTWSVM algorithms is 92.58%, while the accuracy of the FTWSVM-S
algorithms is 93.16%. The accuracy of the v-FTWSVM algorithms is 92.32%, while the accuracy of the v-FTWSVM-S
algorithms is 92.57%. To show the time performance of twin support vector machine based on iteration method. Table 5
shows the time of algorithms using different methods. We can see that fuzzy twin support vector machine based on iterative
method runs faster than that of based on traditional method. For example, for heart-statlog, the time of v-FTWSVM
algorithms is 0.23, while the time of v-FTWSVM-S algorithms is 0.13.The time of FTWSVM algorithms is 0.42, while the
time of FTWSVM-S algorithms is 0.19.
Moreover, we also give total average running time for all datasets using different fuzzy twin support vector machine, as
seen in Fig.1. It is seen that fuzzy twin support vector machines based on iterative method including FTWSVM-S and vFTWSVM-S are superior to those based on traditional method.
Table 5: Running time(second)
Datasets
heart-statlog
sonar
ionosphere
banknote_authenticatio
n
diabetes
breast_cancer
housing
wholesale_customers
vertebral_column
parkinsons
vFTWSVM
0.23
0.24
0.72
FTWSV
M
0.42
0.39
0.71
v-FTWSVMS
0.13
0.06
0.25
FTWSVMS
0.19
0.18
0.28
78.43
77.96
31.43
30.65
11.47
6.03
3.71
2.25
0.9
0.24
12.5
6.68
3.43
2.2
0.9
0.32
5.72
1.73
1.11
0.43
0.19
0.05
6.2
1.74
1.16
0.44
0.18
0.05
12
Second
10
8
6
4
2
0
v-FTWSVM
FTWSVM
v-FTWSVM-S
FTWSVM-S
Figure 1 Total average running time for all datasets using different fuzzy twin support vector machine
Page 16
In summary, the classification performance of fuzzy twin support vector machine based on iterative method is comparable
to that of fuzzy twin support vector machine based on traditional method. And their running time is much faster than that of
this paper, we study fuzzy twin support vector machine based on iterative method. At the same time, we also compare their
performance with those using traditional method. Experiments are done for TWSVM, FTWSVM, and v-FTWSVM on UCI
datasets. The experimental results indicate that the performance based on successive over-relaxation iterative method is not
only improved, but also its accuracy is comparable to those using traditional method. Of course, what we have obtained is
only the results of binary classification data or small-scale data. In future, we need to further study multiple classification
dataset or large-scale dataset.
References
[1] T. Le, D. Tran, W. Ma, et al, "Robust support vector machine," International Joint Conference on Neural Networks
(IJCNN), pp. 4137-4144, 2014.
[2] V. Bloom, I. Griva, B. Kwon, et al, "Exterior-point method for support vector machines," IEEE Transactions on Neural
Networks and Learning Systems, 25(7),pp. 1390-1393, 2014.
[3] R.K. Jayadeva, R. Khemchandani, S. Chandra, "Twin support vector machines for pattern classification," IEEE Trans.
Pattern Analysis and Machine Intelligence, 29(5),pp. 905910, 2007.
[4] Y.H. Shao, C.H. Zhang, N.Y. Deng et al, "Improvements on twin support vector machine," IEEE Transaction on
Neural Networks, 22(6),pp.10459227, 2011.
[5] X. Peng, "A v-twin support vector machine(v-TSVM)classifier and its geometric algorithms," Information Sciences,
180,pp.3863-3875, 2010.
[6] S.F. Ding, J.Z. Yu, B.J. Qi, et al, "An overview on twin support vector machines," Artificial Intelligence Review,
42(2),pp.245-252, 2014.
[7] D. Tomar, S. Agarwal, "Twin support vector machine: a review from 2007 to 2014," Egyptian Informatics Journal,
16,pp.5569, 2015.
[8] Y. Tian, Z. Qi, "Review on: twin support vector machines," Annals of Data Science, 1(2),pp.253-277, 2014.
[9] S.F. Ding, "An improved twin support vector machine," Journal of Liaoning Shihua University, 32(4),pp.1672-6952,
2012.
[10] K. Li, N. Li, X.X. Lu, "Twin support vector machine algorithm with fuzzy weighting," Computer Engineering and
Applications, 49(4),pp.162-165, 2013.
[11] O.L. Mangasarian, D.R. Musicant, "Successive overrelaxation for support vector machines," IEEE Transactions on
Neural Networks, 10(5),pp.1032-1037, 1999.
AUTHOR
Kai Li received the B.S. and M.S. degrees in mathematics department electrical engineering department
from Hebei University, Baoding, China, in 1982 and 1992,respectively. He received the Ph.D. degree
from Beijing Jiaotong University, Beijing, China, in 2001.He is currently a Professor in college of
computer science and technology, Hebei University. His current research interests include machine
learning, data mining, computational intelligence, and pattern recognition.
Shaofang Hu received the bachelor degree in computer science and technology from Industrial and
Commercial College HeiBei University, Baoding, Hebei, China, in 2013, and she is currently pursuing
the M.E. degree in the computer science and technology, Hebei University, Baoding, Hebei, China. Her
research interests include machine learning, data mining, and pattern recognition.
Page 17