IJRS Final Modified

Pal and Mather Support Vector Machines
Support Vector Machines for Classification in Remote Sensing
Mahesh Pal and P. M. Mather

School of Geography
University of Nottingham
UK
Abstract. Support Vector Machines (SVM) have been introduced recently
in machine learning for pattern recognition. In this paper, multi-class SVMs
are used for land use classification using ETM+ multispectral and DAIS
hyperspectral data. Results show that the SVM achieves a higher level of
classification accuracy than either the maximum likelihood or the
backpropagation neural network classifier. The SVM classifier can produce
higher accuracies with smaller training datasets and high-dimensional data.
1. Introduction
Neural classifiers are now widely used in remote sensing (Benediktsson et al., 1990;
Heermann and Khazenie, 1992). Although neural networks may generally be used to
classify data at least as accurately as statistical classification approaches, there are a
range of factors that limit their use (Wilkinson, 1997). Another classification technique
based on statistical learning theory (Vapnik, 1995) called Support Vector Machines
(SVM) has recently been applied to the problem of remote sensing data classification
(Huang et al., 2002; Zhu and Blumberg, 2002). This technique is based on statistical
learning theory and the optimal separation of classes (Vapnik 1995; Vapnik and
Chervonenkis 1971). In the case of two-class pattern recognition problem if the classes
are linearly separable, this technique selects, from among the infinite number of linear
classifiers that separate the data, the one that minimise the generalisation error. Thus,
the selected hyperplane will be one that leaves the maximum margin between the two
classes, where margin is defined as the sum of the distances of the hyperplane from the
closest point of the two classes (Vapnik, 1995). This problem of maximising the
margin can be solved using standard Quadratic Programming (QP) optimisation
techniques. The data points that are closest to the hyperplane are used to measure the
margin, hence these data points are termed support vectors. The number of support
vectors is thus small as they are points close to the class boundaries (Vapnik, 1995).
If the two classes are linearly non-separable, the SVMs try to find the hyperplane that
maximises the margin and that, at the same time, minimises a quantity proportional to
the number of misclassification errors. The trade off between margin and
misclassification error is controlled by a positive constant that has to be chosen
beforehand (Cortes and Vapnik, 1995). The technique of designing SVMs can be
extended for non-linear decision surfaces also. Boser et al. (1992) suggested projecting
input data into a high dimensional feature space through some nonlinear mapping and
formulating a linear classification problem in that feature space. Further, kernel
functions were used to reduce the computational cost in feature space (Vapnik
1995). Detailed discussion on SVM can be found in Vapnik (1995), Cristianini and
Shawe-Taylor (2000).
The SVM was initially designed for binary (two-class) problems. When dealing with
multiple classes, an appropriate multi-class method is needed. Initially one against the
rest approach (Vapnik, 1995) was the most implemented SVM multi class classification
technique. This multi class technique compares a given class with all the others put
together, thus generating n classifiers, where n is the number of classes. The final output
of this SVMs is the class that corresponds to the SVM with the largest margin. For
multi-class data set problems one has to determine n hyperplanes. Thus, this method
requires the solution of n QP optimisation problems, each of which separates one class
from the others.
Another way is to combine several classifiers in a pair-wise approach: the one
against one approach (Knerr et al., 1990) applies pairwise comparisons between
classes. In this method, all possible two-class classifiers are evaluated from the
training set of n classes, each classifier being trained on only two out of n classes.
There would be a total of n(n-1)/2 classifiers. Applying each classifier to the
vectors of the test data gives one vote to the winning class. The data is assigned
the label of the class with most votes.
2. Data and methodology used

The two study areas used in the work reported here are located near the town of
Littleport in UK and La Mancha region in Spain, respectively. For the Littleport area,
ETM+ data acquired on 19th June 2000, were used. The classification problem involves
the identification of seven land cover types (wheat, potato, sugar beet, onion, peas,
lettuce and beans). For the La Mancha area, hyperspectral data acquired on 29 th June
2000 by the DAIS 7915 airborne imaging spectrometer were used. Eight different land
cover types (wheat, water body, dry salt lake, hydrophytic vegetation, vineyards, bare
soil, pasture lands and built up area) were used.

In this study, both the "one against one" and the "one against the rest" approaches to
generating multi-class SVMs were used. Two SVM-based software packages using
different approach to solve quadratic optimisation problem were used. The first of these,
RHUL_SVM, uses both the one against one and the one against the rest approaches
while the other, LIBSVM, is based on a one against one approach. Results obtained
using these SVMs were compared with results derived from more traditional Maximum
Likelihood (ML) and Neural Network (NN, using SNNS software) classifiers. For this
study, a standard back-propagation neural classifier with one hidden layer having twenty
six nodes was used. All other factors affecting the neural network classifier were set as
recommended by Kavzoglu (2001).
Ground reference pixels for both test areas were selected using a random sampling
procedure. Selected pixels were divided so as to remove any possible bias caused by
using the same pixels for testing and training the classifiers. A total of 2700 training and
2037 test pixels with ETM+ data and 800 (100 pixels/class) training and 3800 test
pixels with DAIS data were used. To control over training of NN a validation data set of
60 pixels/class was also used.
3. Results
The concept of the kernel was introduced earlier to extend the SVM to deal with non-
linear decision surfaces. There is little guidance in the literature to the best choice of
kernel and kernel specific parameters. A number of trials were carried out using five
different kernels with different kernel specific parameters by training and testing the
classifier. A radial basis kernel function with parameters = 2 and C = 5000 gave the
highest overall classification accuracy. Both one against one and one against the rest
strategies were used. Table 1 lists the training times taken using a Sun workstation and
the classification accuracies achieved. Results obtained using ML and NN classifiers are
given in table 2.
[Insert table 1 about here]
[Insert table 2 about here]
The results show that the time taken using the one against the rest method is much
higher, and the classification accuracies are lower, than the one against one technique.
This result suggests that the one against one method should be employed for
generating multi-class SVM. One reason for this finding could be unbalanced data size
of the two training classes while using one against the rest multi-class method. The
level of classification accuracy achieved by the one against one SVM is higher than
that produced by either the ML or the NN classifier.
In the second part of the experiment, DAIS hyperspectral data were used to study the
behaviour of the SVM, ML and NN classifiers with a fixed-size training set (800 pixels)
and an increasing number of features (spectral bands). A total of 65 features was used,
as seven bands with severe striping were rejected. The procedure began with the use of
five bands. An additional five bands was added at each cycle, thus generating thirteen
sets of results.
[Insert figure 1about here]
Figure 1 suggests that the performance of an SVM is acceptable even with a small
training data set size, in comparison with other classifiers. Results also suggest that
classification accuracy using an SVM, ML and NN declines slightly when the number
of features exceeds 50 or so. SVMs are based on the principle of optimal separation of
classes. Thus, the error on test data set will not depend on the dimensionality of the
input space (Vapnik, 1995). Slight performance degradation after 55 bands may be
attributed to the quality of training data obtained as we add new features every time.
4. Conclusions
Comparison of the results obtained by the SVM and those produced by other
classifiers suggests that the SVM classifier generally achieves higher classification
accuracies. Like neural classifiers, the effective use of SVM depends on the values of
a few user-defined parameters. Huang et al. (2002) discusses some of factors affecting
the performance of SVM in detail. This study concludes that the approach used by
Huang et al. (2002) is not optimal for multi-class classification for two reasons: (1) for
their study they replicated the sample size of smaller class, thus increasing number of
training patterns, and (2) used a one against the rest strategy for generating the SVM.
The main problem with one against the rest is that it may lead to the existence of
unclassified data, thus producing lower classification accuracies. Further, higher
training time for one against one approach by RHUL_SVM may be due the
technique used to solve quadratic programming optimisation problem.
The study reported here supports the use of a one against one multi-class
approach for multi-class image classification problems. Finally, SVM seems to
perform well with high dimensional data even with a small number of training data.
Acknowledgement
The RHUL_SVM software was made available by AT&T, Royal Holloway College,
University of London. The DAIS data were kindly made available by Prof. J. Gumuzzio
of the Autonomous University of Madrid. Computing facilities were provided by the
School of Geography, University of Nottingham. Mahesh Pals research was supported
by a Commonwealth Scholarship. The authors are grateful for the critical comments of
two anonymous referees, whose advice has led to an improvement in the presentation of
many of the findings contained in this paper.
References
BENEDIKTSSON, J. A., SWAIN, P. H. and ERASE, O. K., 1990, Neural network
approaches versus statistical methods in classification of multisource remote
sensing data. IEEE Transactions on Geoscience and Remote Sensing, 28, 540-551.
BOSER, B., GUYON, I. and VAPNIK, V. N., 1992, A training algorithm for optimal
margin classifiers. Proceedings of 5th Annual Workshop on Computer Learning
Theory, Pittsburgh, PA: ACM, 144-152.
CHANG, C., AND LIN, C., 2001, LIBSVM: A Library for Support Vector Machines.
Department of Computer Science and Information Engineering, National Taiwan
University, Taiwan.http://www.csie.ntu.edu.tw/~cjlin/libsvm.
CORTES, C. and VAPNIK, V. N., 1995, Support vector networks. Machine Learning,
20, 273 - 297.
CRISTIANINI, N. and SHAWE-TAYLOR, J., 2000, An Introduction to Support Vector
Machines. London, Cambridge University Press.

HEERMAN, P. D. and KHAZENIE, N., 1992, Classification of multispectral remote
sensing data using a back propagation neural network. IEEE Transactions on
Geoscience and Remote Sensing, 30, 81-88.
HUANG, C., DAVIS, L. S. and TOWNSHEND, J. R. G., 2002, An assessment of
support vector machines for land cover classification. International Journal of
Remote Sensing, 23, 725-749.
KAVZOGLU, T., 2001, An Investigation of the Design and Use of Feed-forward
Artificial Neural Networks in the Classification of Remotely Sensed Images.
PhD thesis. School of Geography, The University of Nottingham,
Nottingham, UK.
KNERR, S., PERSONNAZ, L. and DREYFUS, G., 1990, Single-layer learning
revisited: A stepwise procdure for building and training neural network.
Neurocomputing: Algorithms, Architectures and Applications (ed.) , NATO ASI,
Springer.
SAUNDERS, C., STITSON, M. O., WESTON, J., BOTTOU, L., SCHLKOPF, B. and
SMOLA, A., 1998, Support Vector Machine - Reference Manual. Technical
Report, CSD-TR-98-03, Royal Holloway and AT&T, University of London.
VAPNIK, V. N., 1995, The Nature of Statistical Learning Theory. New York: Springer-
Verlag.
WILKINSON, G. G., 1997, Open questions in neurocomputing for Earth observation.
Neuro-Computational in Remote Sensing Data Analysis. New York: Springer-
Verlag, 3-13.
ZHU, G. and BLUMBERG, D. G., 2002, Classification using ASTER data and SVM
algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,
80, 233-240.
100
90
Maximum likelihood
80 Neural network
Accuracy (%)
Support vector machines
70
60
50
40
5 10 15 20 25 30 35 40 45 50 55 60 65
Maximum likelihood 64.8 75.7 80.6 80.4 84.3 85.9 88.8 89.1 89.3 88.7 88.9 87.4 85.8
Neural network 48.6 66.7 77.2 79.6 83.3 86.6 85.6 88.7 88.5 89.4 89.4 89.6 88.4
Support vector machines 66.7 74.7 83.5 84.8 87.2 90.5 91.5 92.1 92.3 93.4 94 93.4 93.6
Number of bands
Figure 1.
Table 1
Multi-class method Number of Accuracy (%) Training time

training and Kappa (CPU minutes)
pixels value
One against rest (RHUL_SVM) 2700 79.73 (0.77) 505.27
One against one (RHUL_SVM) 2700 87.37 (0.86) 21.54
One against one (LIBSVM) 2700 87.9 (0.87) 0.30
Table 2
Classifier Accuracy (%) and Kappa value Training time (CPU minutes)
ML 82.9 (0.80) 0.20
NN 85.1(0.83) 58
Captions
Figure 1. Variation in classification accuracy with increasing number of features and
fixed training data set.

Table Captions
Table 1. Classification accuracy and training time using SVMs and different multi-class
methods with Littleport ETM+ data.
Table 2. Classification accuracies with ML and NN classifiers with Littleport ETM+

data.

IJRS Final Modified

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IJRS Final Modified

Uploaded by

Copyright:

Available Formats

Pal and Mather Support Vector Machines

Support Vector Machines for Classification in Remote Sensing

Mahesh Pal and P. M. Mather

Abstract. Support Vector Machines (SVM) have been introduced recently

in machine learning for pattern recognition. In this paper, multi-class SVMs

classification accuracy than either the maximum likelihood or the

backpropagation neural network classifier. The SVM classifier can produce

higher accuracies with smaller training datasets and high-dimensional data.

classify data at least as accurately as statistical classification approaches, there are a

margin can be solved using standard Quadratic Programming (QP) optimisation

misclassification error is controlled by a positive constant that has to be chosen

formulating a linear classification problem in that feature space. Further, kernel

from the others.

Another way is to combine several classifiers in a pair-wise approach: the one

There would be a total of n(n-1)/2 classifiers. Applying each classifier to the

the label of the class with most votes.

2. Data and methodology used

soil, pasture lands and built up area) were used.

recommended by Kavzoglu (2001).

60 pixels/class was also used.

[Insert table 1 about here]

[Insert table 2 about here]

that produced by either the ML or the NN classifier.

[Insert figure 1about here]

unclassified data, thus producing lower classification accuracies. Further, higher

technique used to solve quadratic programming optimisation problem.

approach for multi-class image classification problems. Finally, SVM seems to

of the Autonomous University of Madrid. Computing facilities were provided by the

School of Geography, University of Nottingham. Mahesh Pals research was supported

many of the findings contained in this paper.

BENEDIKTSSON, J. A., SWAIN, P. H. and ERASE, O. K., 1990, Neural network

approaches versus statistical methods in classification of multisource remote

margin classifiers. Proceedings of 5th Annual Workshop on Computer Learning

Theory, Pittsburgh, PA: ACM, 144-152.

Department of Computer Science and Information Engineering, National Taiwan

20, 273 - 297.

CRISTIANINI, N. and SHAWE-TAYLOR, J., 2000, An Introduction to Support Vector

Machines. London, Cambridge University Press.

HEERMAN, P. D. and KHAZENIE, N., 1992, Classification of multispectral remote

sensing data using a back propagation neural network. IEEE Transactions on

Geoscience and Remote Sensing, 30, 81-88.

HUANG, C., DAVIS, L. S. and TOWNSHEND, J. R. G., 2002, An assessment of

support vector machines for land cover classification. International Journal of

Remote Sensing, 23, 725-749.

KAVZOGLU, T., 2001, An Investigation of the Design and Use of Feed-forward

Artificial Neural Networks in the Classification of Remotely Sensed Images.

PhD thesis. School of Geography, The University of Nottingham,

KNERR, S., PERSONNAZ, L. and DREYFUS, G., 1990, Single-layer learning

revisited: A stepwise procdure for building and training neural network.

Neurocomputing: Algorithms, Architectures and Applications (ed.) , NATO ASI,

SMOLA, A., 1998, Support Vector Machine - Reference Manual. Technical

Report, CSD-TR-98-03, Royal Holloway and AT&T, University of London.

WILKINSON, G. G., 1997, Open questions in neurocomputing for Earth observation.

Neuro-Computational in Remote Sensing Data Analysis. New York: Springer-

Support vector machines

Multi-class method Number of Accuracy (%) Training time

Figure 1. Variation in classification accuracy with increasing number of features and

fixed training data set.

Table 2. Classification accuracies with ML and NN classifiers with Littleport ETM+

You might also like