Professional Documents
Culture Documents
are used for land use classification using ETM+ multispectral and DAIS
hyperspectral data. Results show that the SVM achieves a higher level of
1. Introduction
Neural classifiers are now widely used in remote sensing (Benediktsson et al., 1990;
Heermann and Khazenie, 1992). Although neural networks may generally be used to
range of factors that limit their use (Wilkinson, 1997). Another classification technique
based on statistical learning theory (Vapnik, 1995) called Support Vector Machines
(SVM) has recently been applied to the problem of remote sensing data classification
(Huang et al., 2002; Zhu and Blumberg, 2002). This technique is based on statistical
learning theory and the optimal separation of classes (Vapnik 1995; Vapnik and
Chervonenkis 1971). In the case of two-class pattern recognition problem if the classes
are linearly separable, this technique selects, from among the infinite number of linear
Pal and Mather Support Vector Machines
classifiers that separate the data, the one that minimise the generalisation error. Thus,
the selected hyperplane will be one that leaves the maximum margin between the two
classes, where margin is defined as the sum of the distances of the hyperplane from the
closest point of the two classes (Vapnik, 1995). This problem of maximising the
techniques. The data points that are closest to the hyperplane are used to measure the
margin, hence these data points are termed support vectors. The number of support
vectors is thus small as they are points close to the class boundaries (Vapnik, 1995).
If the two classes are linearly non-separable, the SVMs try to find the hyperplane that
maximises the margin and that, at the same time, minimises a quantity proportional to
the number of misclassification errors. The trade off between margin and
beforehand (Cortes and Vapnik, 1995). The technique of designing SVMs can be
extended for non-linear decision surfaces also. Boser et al. (1992) suggested projecting
input data into a high dimensional feature space through some nonlinear mapping and
functions were used to reduce the computational cost in feature space (Vapnik
1995). Detailed discussion on SVM can be found in Vapnik (1995), Cristianini and
Shawe-Taylor (2000).
The SVM was initially designed for binary (two-class) problems. When dealing with
multiple classes, an appropriate multi-class method is needed. Initially one against the
rest approach (Vapnik, 1995) was the most implemented SVM multi class classification
Pal and Mather Support Vector Machines
technique. This multi class technique compares a given class with all the others put
together, thus generating n classifiers, where n is the number of classes. The final output
of this SVMs is the class that corresponds to the SVM with the largest margin. For
multi-class data set problems one has to determine n hyperplanes. Thus, this method
requires the solution of n QP optimisation problems, each of which separates one class
against one approach (Knerr et al., 1990) applies pairwise comparisons between
classes. In this method, all possible two-class classifiers are evaluated from the
training set of n classes, each classifier being trained on only two out of n classes.
vectors of the test data gives one vote to the winning class. The data is assigned
Littleport in UK and La Mancha region in Spain, respectively. For the Littleport area,
ETM+ data acquired on 19th June 2000, were used. The classification problem involves
the identification of seven land cover types (wheat, potato, sugar beet, onion, peas,
lettuce and beans). For the La Mancha area, hyperspectral data acquired on 29 th June
2000 by the DAIS 7915 airborne imaging spectrometer were used. Eight different land
cover types (wheat, water body, dry salt lake, hydrophytic vegetation, vineyards, bare
In this study, both the "one against one" and the "one against the rest" approaches to
generating multi-class SVMs were used. Two SVM-based software packages using
different approach to solve quadratic optimisation problem were used. The first of these,
RHUL_SVM, uses both the one against one and the one against the rest approaches
while the other, LIBSVM, is based on a one against one approach. Results obtained
using these SVMs were compared with results derived from more traditional Maximum
Likelihood (ML) and Neural Network (NN, using SNNS software) classifiers. For this
study, a standard back-propagation neural classifier with one hidden layer having twenty
six nodes was used. All other factors affecting the neural network classifier were set as
Ground reference pixels for both test areas were selected using a random sampling
procedure. Selected pixels were divided so as to remove any possible bias caused by
using the same pixels for testing and training the classifiers. A total of 2700 training and
2037 test pixels with ETM+ data and 800 (100 pixels/class) training and 3800 test
pixels with DAIS data were used. To control over training of NN a validation data set of
3. Results
The concept of the kernel was introduced earlier to extend the SVM to deal with non-
linear decision surfaces. There is little guidance in the literature to the best choice of
kernel and kernel specific parameters. A number of trials were carried out using five
different kernels with different kernel specific parameters by training and testing the
classifier. A radial basis kernel function with parameters = 2 and C = 5000 gave the
Pal and Mather Support Vector Machines
highest overall classification accuracy. Both one against one and one against the rest
strategies were used. Table 1 lists the training times taken using a Sun workstation and
the classification accuracies achieved. Results obtained using ML and NN classifiers are
given in table 2.
The results show that the time taken using the one against the rest method is much
higher, and the classification accuracies are lower, than the one against one technique.
This result suggests that the one against one method should be employed for
generating multi-class SVM. One reason for this finding could be unbalanced data size
of the two training classes while using one against the rest multi-class method. The
level of classification accuracy achieved by the one against one SVM is higher than
In the second part of the experiment, DAIS hyperspectral data were used to study the
behaviour of the SVM, ML and NN classifiers with a fixed-size training set (800 pixels)
and an increasing number of features (spectral bands). A total of 65 features was used,
as seven bands with severe striping were rejected. The procedure began with the use of
five bands. An additional five bands was added at each cycle, thus generating thirteen
sets of results.
Figure 1 suggests that the performance of an SVM is acceptable even with a small
training data set size, in comparison with other classifiers. Results also suggest that
classification accuracy using an SVM, ML and NN declines slightly when the number
Pal and Mather Support Vector Machines
of features exceeds 50 or so. SVMs are based on the principle of optimal separation of
classes. Thus, the error on test data set will not depend on the dimensionality of the
input space (Vapnik, 1995). Slight performance degradation after 55 bands may be
attributed to the quality of training data obtained as we add new features every time.
4. Conclusions
Comparison of the results obtained by the SVM and those produced by other
classifiers suggests that the SVM classifier generally achieves higher classification
accuracies. Like neural classifiers, the effective use of SVM depends on the values of
a few user-defined parameters. Huang et al. (2002) discusses some of factors affecting
the performance of SVM in detail. This study concludes that the approach used by
Huang et al. (2002) is not optimal for multi-class classification for two reasons: (1) for
their study they replicated the sample size of smaller class, thus increasing number of
training patterns, and (2) used a one against the rest strategy for generating the SVM.
The main problem with one against the rest is that it may lead to the existence of
training time for one against one approach by RHUL_SVM may be due the
The study reported here supports the use of a one against one multi-class
perform well with high dimensional data even with a small number of training data.
Pal and Mather Support Vector Machines
Acknowledgement
The RHUL_SVM software was made available by AT&T, Royal Holloway College,
University of London. The DAIS data were kindly made available by Prof. J. Gumuzzio
by a Commonwealth Scholarship. The authors are grateful for the critical comments of
two anonymous referees, whose advice has led to an improvement in the presentation of
References
sensing data. IEEE Transactions on Geoscience and Remote Sensing, 28, 540-551.
BOSER, B., GUYON, I. and VAPNIK, V. N., 1992, A training algorithm for optimal
CHANG, C., AND LIN, C., 2001, LIBSVM: A Library for Support Vector Machines.
University, Taiwan.http://www.csie.ntu.edu.tw/~cjlin/libsvm.
CORTES, C. and VAPNIK, V. N., 1995, Support vector networks. Machine Learning,
Nottingham, UK.
Springer.
SAUNDERS, C., STITSON, M. O., WESTON, J., BOTTOU, L., SCHLKOPF, B. and
VAPNIK, V. N., 1995, The Nature of Statistical Learning Theory. New York: Springer-
Verlag.
Verlag, 3-13.
Pal and Mather Support Vector Machines
ZHU, G. and BLUMBERG, D. G., 2002, Classification using ASTER data and SVM
algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,
80, 233-240.
Pal and Mather Support Vector Machines
100
90
Maximum likelihood
80 Neural network
Accuracy (%)
70
60
50
40
5 10 15 20 25 30 35 40 45 50 55 60 65
Maximum likelihood 64.8 75.7 80.6 80.4 84.3 85.9 88.8 89.1 89.3 88.7 88.9 87.4 85.8
Neural network 48.6 66.7 77.2 79.6 83.3 86.6 85.6 88.7 88.5 89.4 89.4 89.6 88.4
Support vector machines 66.7 74.7 83.5 84.8 87.2 90.5 91.5 92.1 92.3 93.4 94 93.4 93.6
Number of bands
Figure 1.
Pal and Mather Support Vector Machines
Table 1
Table 2
Classifier Accuracy (%) and Kappa value Training time (CPU minutes)
ML 82.9 (0.80) 0.20
NN 85.1(0.83) 58
Pal and Mather Support Vector Machines
Captions
Table Captions
Table 1. Classification accuracy and training time using SVMs and different multi-class
methods with Littleport ETM+ data.